Arabic: Import/Export

 

There are two methods for converting Nota Bene documents that contain multilingual text to a format that can be opened by publishers or colleagues who do not have Nota Bene: (1) conversion to PDF (Adobe Portable Document Format) or (2) conversion to RTF (Rich Text Format). While we provide RTF filters for conversion to/from RTF Unicode-encoded Arabic, the issues related to conversion are complex, and the results may or may not be satisfactory. A detailed discussion of these issues follows. Conversion to PDF is more reliable and is generally recommended for documents containing Arabic. Conversion to PDF requires additional software. We recommend pdfFactory, an excellent but reasonably priced program that converts files from any Windows program to PDF.

 

This release of Arabic Lingua should support both import and export of RTF Unicode-encoded Arabic, but we make no guarantee as to whether it will actually work for the sources from, or to which, you want to import or export.  While we have spent considerable programming time getting this to work, we cannot make any promises in this regard because, among other things: (a) We've not been able to test the full range of Arabic Unicode sources which Nota Bene users may encounter, and (b) Even those that we have encountered do themselves not properly render properly Unicode-encoded text.

 

Quite frankly, our design decision for this release was as follows:  Rather than not trying to enable Arabic import/export at all in this first release (given the complexity of the issues, a path we certainly considered taking), we instead decided to (a) code in the basic functionality, and include the requisite tables, so that (b) users could try importing/exporting text as their needs required.  If there proved to be problems with a specific site or destination, (c) users could e-mail us the relevant information (i.e., tell us which web site they tried to access, or send us an RTF file that didn't import or export correctly), and we would then (d) take a look at the issues.

 

This approach is possible only if users understand that (a) RTF conversion for Arabic is not fully supported yet, and (b) while we intend to address issues that arise, we cannot guarantee that we will be able to correct all problems.

 

Part of the reason for qualifying our support, at least for the moment, has to do with the complexity of Unicode Arabic encoding.  There are two main types of Arabic encoding:

 

Canonical Encoding--the primary Arabic block (u06xx) includes distinct encoding for each Arabic character, without regards to its rendering (i.e., its contextual forms) in a file.  In Unicode charts, this character is usually represented in its stand-alone form, although what Unicode intends to be encoding here is not the form of the character, but the character value itself
Presentation Forms--additional blocks of Arabic (primarily in blocks ufbxx and ufexx) that contain the presentation forms (initial, medial, and final, along with a duplicate of the stand-alone form) were added in order to (a) maintain compatibility with other encoding schemes, and (b) to potentially enable proper rendering of Arabic in software that did not have the capability of automatic rendering/contextualization

 

By adding presentation forms, Unicode made life easier for people who wanted to do Arabic (since the various forms required to show Arabic finally had a fixed, and not merely a private, code position, font developers and others could develop fonts that were standard across applications), even if, in doing so, it moved away from Unicode's initial commitment to be encoding only character values, and not "glyphs" or typographical forms of characters.

 

Import

 

Nota Bene should be able to import RTF formatted Arabic (note that most web sites also save text to the clipboard in RTF format, even if they are also saved in HTML) that is encoded in canonical or presentation form, or some mixture of both.  When doing so, it always converts the characters into Nota Bene's smart forms, which Nota Bene then renders properly.

 

Note that if there is a fixed form from any source which is not its proper form, Nota Bene's auto-rendering rules will automatically convert it to the proper form.  Conversely, if there is some character which the source file intended to be in its improper form, the version imported into Nota Bene will not retain that intended (improper) form, but will be automatically corrected.  In some cases, this may be undesirable.  (As noted above, we welcome reports of problems, even if we cannot make any promises as to if, or how promptly, they will be addressed.)

 

Export

 

Because some destinations should be able to handle Unicode canonical encoding, while others may only be able to deal with presentation forms (and most others not either), Nota Bene gives you the choice between these two formats.  When you choose Save As, and then RTF (Rich Text Format) -- Custom, you will be given the choice between:

 

Unicode Canonical Encoding--all characters will be written out to the primary Arabic block, always as their "isolated" form (it is up to the receiving software to convert these characters to their appropriate rendered forms)
Alternative Presentation Forms--all characters will be written out to their already rendered forms

 

Only fully Arabic enabled applications have any chance of dealing with files encoded using the canonical encoding (but even they may not work properly--for example, even well-known programs [other than Nota Bene!] often have trouble placing punctuation correctly).  While other applications may be able to display the rendered forms, whether or not they can display them in the correct sequence, and treat surrounding punctuation correctly, depends solely on that application's Arabic capabilities.  That said, if you encounter problems exporting text to an application that you know to be fully Arabic enabled, please send us the relevant files along with a full explanation of the issues you are encountering.  (But note again that while we welcome reports of such problems, we cannot make any promises as to if, or how promptly, they will be addressed.)

 

In summary, it is our hope that the import/export capabilities we have already provided will work for Unicode encoded RTF text.  If it doesn't, we'd like to hear from you.  But it's important to understand that we cannot make any promises about how if, and how quickly, we'll be able to pursue whatever issues you might encounter.

 

 

See also:

Arabic