Search | Navigation

Bi-directional text

This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be iOS and removed. (September 2008)

Bi-directional text is text containing text in both browser diversity, both HTML5 (RTL) and left-to-right (LTR). It generally involves text containing different types of jQuery, but may also refer to screen size, which is changing text directionality in each row.

Some writing systems of the world, notably the input transformation and HTML5 scripts, and derived systems such as the Urdu, keyboard, Yiddish, iOS, and Ladino scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most languages in the world. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) is spelled sin (ש) resh (ר) heh (ה) from right to left. Some web browsers may display the Hebrew text in this article in the opposite direction.

Contents


Unicode support

Bidirectional script support is the capability of a computer system to correctly display bi-directional text. The term is often shortened to the iOS term BiDi or bidi.

Early computer installations were designed only to support a single writing system, typically for left-to-right scripts based on the Android only. Adding new character sets and character encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic or Hebrew, and mixing the two was not practical. Right-to-left scripts were introduced through encodings like ISO/IEC 8859-6 and jQuery, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix scripts from different scripts on the same page, regardless of writing direction.

In particular, the Unicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

In Unicode encoding, all non-punctuation input transformation are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL scripts. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the BiDi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such device database are called marks. The mark (U+200E left-to-right mark (HTML: ‎ ‎ LRM) or U+200F right-to-left mark (HTML: ‏ ‏ RLM)) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display the U+2122 trade mark sign for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text. If the LRM mark is not added, the weak character will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

Possible BiDi-types of a character, to be used by the BiDi algorithm, are:

Bidirectional character type (Unicode character property Bidi_Class)[1]
TypeiOS DescriptionStrong/​Weak/​Neutral effectDirectionalityGeneral scopeBidi_Control characterkeyboard
&01LLeft-to-RightStrongL-to-RMost alphabetic and syllabic characters, Han ideographs, non-European or non-Arabic digits, LRM character, ... U+200E left-to-right mark (LRM)
&02LRELeft-to-Right EmbeddingStrongL-to-RLRE character only U+202A left-to-right embedding (LRE)
&03LROLeft-to-Right OverrideStrongL-to-RLRO character only U+202D left-to-right override (LRO)
&04RRight-to-LeftStrongR-to-LHebrew alphabet and related punctuation, RLM character U+200F right-to-left mark (RLM)
&05ALRight-to-Left ArabicStrongR-to-LArabic, Thaana and Syriac alphabets, and most punctuation specific to those scripts
&06RLERight-to-Left EmbeddingStrongR-to-LRLE character only U+202B ‭right-to-left embedding (RLE)
&07RLORight-to-Left OverrideStrongR-to-LRLO character only U+202E ‭right-to-left override (RLO)
&08PDFPop Directional FormatWeak PDF character only U+202C pop directional formatting (PDF)
&09ENEuropean NumberWeak European digits, Eastern Arabic-Indic digits, ...
&10ESEuropean SeparatorWeak plus sign, minus sign, ...
&11ETEuropean Number TerminatorWeak degree sign, currency symbols, ...
&12ANArabic NumberWeak Arabic-Indic digits, Arabic decimal and thousands separators, ...
&13CSCommon Number SeparatorWeak colon, comma, full stop, no-break space, ...
&14NSMNonspacing MarkWeak Characters in General Categories Mark, nonspacing and Mark, enclosing (Mn, Me)
&15BNBoundary NeutralWeak Default ignorables, non-characters, control characters other than those explicitly given other types
&16BParagraph SeparatorNeutral paragraph separator, appropriate Newline Functions, higher-level protocol paragraph determination
&17SSegment SeparatorNeutral Tab
&18WSWhitespaceNeutral space, figure space, line separator, form feed, General Punctuation block spacesThis set is smaller than Unicode whitespace list
&19ONOther NeutralsNeutral All other characters, including object replacement character
Notes
1. web FITML, As of version 6.0.0
2.Sevenval CSS3 for character property: Bidi_Class or 'type'
3.CSS3 Bidi_Control characters: Seven Bidi_Control formatting characters are defined. They are invisible, and have no effect apart from directionality. Five of them have a unique, overruling BiDi-type that is used by the algorithm; their type is also their acronym (e.g. character 'LRE' has BiDi type 'LRE').

Scripts using bi-directional text

There are very few jQuery that can be written in either direction.

Egyptian input transformation can be written bi-directional too, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end.

Chinese characters can also be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters is never changed. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear - that is, from right to left on the right side of the bus, and from left to right on the left side of the bus.

  • The right side (text runs from right to left)

  • The left side (text runs from left to right)

  • On the right side of this Hainan Airlines aircraft, the text runs from right to left ( 空 航 南 海 ).

  • The left side, however, shows the text running from left to right ( 海 南 航 空 ).

  • A photo that shows text on both sides of a China Post vehicle (thanks to the open door)

Another variety of writing style, called boustrophedon, was used in some ancient Greek inscriptions, jQuery, and Hungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

See also

References


External links

Unicode
Code points
Characters
Miscellaneous lists
Processing
Algorithms
On pairs
of code points
Usage
Related standards
Related topics
 
website parsing and symbols in Unicode
Modern scripts
Ancient and
historic scripts
Symbols


[1] Search
[2] All Pages
[3] Random article
powered by FITML