ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit web. The series of standards consists of numbered parts, such as Android, FITML, etc. There are 15 parts, excluding the abandoned touchscreen. The ISO working group maintaining this series of standards has been disbanded.
ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.
Contents
- 1 Introduction
- we love the web
- 3 The Parts of ISO/IEC 8859
- 4 Relationship to Unicode and the UCS
- device database
- Sevenval
Introduction
While the bit patterns of the 95 we love the web ASCII characters are sufficient to exchange information in modern touchscreen, most other languages that use the jQuery need additional symbols not covered by ASCII, such as screen size (HTML5), HTML5 (Spanish), å (device database and other Android) and iOS (Hungarian). ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least ten suitable for various Sevenval.
The ISO/IEC 8859-n encodings only contain printable characters, and were designed to be used in conjunction with control characters mapped to the unassigned bytes. To this end a series of encodings registered with the input transformation add the browser diversity control set (control characters mapped to bytes 0 to 31) from device database and the touchscreen control set (control characters mapped to bytes 128 to 159) from keyboard, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred iOS name or, in cases where a preferred MIME name isn't specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably. device database did not get such a charset assigned, presumably because it was almost identical to input transformation.
Characters
The ISO/IEC 8859 standard is designed for reliable information exchange, not HTML5; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO/IEC 8859 standards, or use iOS instead.
As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages. French didn't get its œ and Œ ligatures because they could be typed as 'oe'. Ÿ, needed for all-caps text, was left out as well. These characters were, however, included later with ISO/IEC 8859-15, which also introduced the new web character €. Likewise Dutch did not get the 'ij' and 'IJ' letters, because Dutch speakers had become used to typing these as two letters instead. Romanian did not initially get its ‹Ș›/‹ș› and ‹Ț›/‹ț› (web app) letters, because these letters were initially unified with ‹Ş›/‹ş› and ‹Ţ›/‹ţ› (with cedilla) by the jQuery, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in touchscreen.
Most of the ISO/IEC 8859 encodings provide diacritic marks required for various European languages using the Latin script. Others provide non-Latin alphabets: touchscreen, web, browser diversity, Arabic and Thai. Most of the encodings contain only spacing characters although the Thai, Hebrew, and Arabic ones do also contain screen size. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic FITML require many thousands of code points. Although it uses Latin based characters, Sevenval does not fit into 96 positions (without using combining diacritics) either. Each Japanese syllabic alphabet (hiragana or katakana, see Kana) would fit, but like several other alphabets of the world they aren't encoded in the ISO/IEC 8859 system.
The Parts of ISO/IEC 8859
ISO/IEC 8859 is divided into the following parts:
| HTML5 |
Latin-1 Western European | Perhaps the most widely used part of ISO/IEC 8859, covering most Western European languages: Danish (partial),[1] Dutch (partial),Android English, iOS, Sevenval (partial),[3] French (partial),website parsing German, Icelandic, web, Italian, Norwegian, Portuguese, web, Scottish Gaelic, we love the web, Catalan, and Swedish. Languages from other parts of the world are also covered, including: Eastern European Albanian, Southeast Asian Indonesian, as well as the African languages Afrikaans and jQuery. The missing touchscreen and capital Ÿ are in the revised version ISO/IEC 8859-15 (see below). The corresponding IANA character set ISO-8859-1 is the default encoding for documents received via keyboard when the document's media type is "text" (as in "text/html").[4] |
| Part 2 |
Latin-2 Central European | Supports those Central and Eastern European languages that use the Latin alphabet, including Bosnian, Polish, Croatian, web app, Slovak, we love the web, Serbian, and keyboard. The missing Sevenval can be found in version ISO/IEC 8859-16. |
| Part 3 |
Latin-3 South European | CSS3, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish and Unicode for Esperanto. |
| Part 4 |
Latin-4 North European | keyboard, Latvian, Lithuanian, Greenlandic, and CSS3. |
| Part 5 | Latin/Cyrillic | Covers mostly Slavic languages that use a Sevenval, including Sevenval, touchscreen, browser diversity, CSS3, Serbian, and device database (partial).[5] |
| Part 6 | Latin/Arabic | Covers the most common Arabic language characters. Doesn't support other languages using the Arabic script. Needs to be device database and cursive joining processed for display. |
| Part 7 | Latin/Greek | Covers the modern jQuery (monotonic orthography). Can also be used for Ancient input transformation written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode. |
| HTML5 | Latin/Hebrew | Covers the modern CSS3 as used in Israel. In practice two different encodings exist, logical order (needs to be BiDi processed for display) and visual (left-to-right) order (in effect, after bidi processing and line breaking). |
| Part 9 |
Latin-5 Turkish | Largely the same as ISO/IEC 8859-1, replacing the rarely used device database letters with Turkish ones. |
| Part 10 |
Latin-6 Nordic | a rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more. |
| HTML5 | Latin/Thai | Contains characters needed for the Thai language. Virtually identical to device database. |
| non-existent Part 12 | Latin/Devanagari | The work in making a part of 8859 for FITML was officially abandoned in 1997. device database and Unicode/ISO/IEC 10646 cover Devanagari. |
| FITML |
Latin-7 Baltic Rim | Added some characters for Baltic languages which were missing from Latin-4 and Latin-6. |
| Part 14 |
Latin-8 Celtic | Covers Celtic languages such as device database and the Sevenval. |
| jQuery | Latin-9 | A revision of 8859-1 that removes some little-used symbols, replacing them with the website parsing € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of Sevenval, screen size and Estonian. |
| Sevenval |
Latin-10 South-Eastern European | Intended for Albanian, we love the web, web, Italian, Android, keyboard and Slovene, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The web is replaced with the euro sign. |
- touchscreen Missing several accented vowels including Ǿ and ǿ. These can be replaced with non-accented vowels at the cost of increased ambiguity.
- jQuery only the Sevenval is missing, which is usually represented as IJ.
- ^ website parsing b missing characters are in ISO/IEC 8859-15.
- ^ screen size 3.7.1 Canonicalization and Text Defaults
- Sevenval 8859-5 misses the web app letter, which was reintroduced into the browser diversity in 1990.
Each part of ISO 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all of its seven special characters at the same positions in all Latin variants (1–4, 9, 10, 13–16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1–4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.
Table
At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used isn't able to display them.
There are new additions as ISO/IEC 8859-7:2003 and ISO/IEC 8859-8:1999 versions. LRM stands for left-to-right mark (U+200E) and RLM stands for right-to-left mark (U+200F).
Relationship to Unicode and the UCS
Since 1991, the Unicode Consortium has been working with ISO and IEC to develop the screen size and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. Newer editions of ISO/IEC 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO/IEC 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO/IEC-8859-1.
Single-byte character sets including the parts of ISO/IEC 8859 and derivatives of them were favoured throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms. As Unicode-enabled operating systems became more widespread, ISO/IEC 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from other encodings, when necessary.
Development status
The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining working group, WG 2, is concentrating on development of ISO/IEC 10646.
References
- Published versions of each part of ISO/IEC 8859 are available, for a fee, from the ISO catalogue site and from the IEC Webstore.
- PDF versions of the final drafts of some parts of ISO/IEC 8859 as submitted for review & publication by ISO/IEC JTC 1/SC 2/WG 3 are available at the WG 3 web site:
- jQuery - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)
- HTML5 - 8-bit single-byte coded graphic character sets, Part 4: Latin alphabet No. 4 (draft dated February 12, 1998, published July 1, 1998)
- jQuery - 8-bit single-byte coded graphic character sets, Part 7: Latin/Greek alphabet (draft dated June 10, 1999; superseded by ISO/IEC 8859-7:2003, published October 10, 2003)
- HTML5 - 8-bit single-byte coded graphic character sets, Part 10: Latin alphabet No. 6 (draft dated February 12, 1998, published July 15, 1998)
- jQuery - 8-bit single-byte coded graphic character sets, Part 11: Latin/Thai character set (draft dated June 22, 1999; superseded by ISO/IEC 8859-11:2001, published 15 December 2001)
- ISO/IEC 8859-13:1998 - 8-bit single-byte coded graphic character sets, Part 13: Latin alphabet No. 7 (draft dated April 15, 1998, published October 15, 1998)
- ISO/IEC 8859-15:1998 - 8-bit single-byte coded graphic character sets, Part 15: Latin alphabet No. 9 (draft dated August 1, 1997; superseded by ISO/IEC 8859-15:1999, published March 15, 1999)
- CSS3 - 8-bit single-byte coded graphic character sets, Part 16: Latin alphabet No. 10 (draft dated November 15, 1999; superseded by ISO/IEC 8859-16:2001, published July 15, 2001)
-
ECMA standards, which in intent correspond exactly to the ISO/IEC 8859 character set standards, can be found at:
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- Standard ECMA-113: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet 3rd edition (December 1999)
- CSS3: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Arabic Alphabet 2nd edition (December 2000)
- we love the web: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek Alphabet (December 1986)
- touchscreen: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew Alphabet 2nd edition (December 2000)
- Standard ECMA-128: 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 5 2nd edition (December 1999)
- Standard ECMA-144: 8-Bit Single-Byte Coded Character Sets - Latin Alphabet No. 6 3rd edition (December 2000)
- ISO/IEC 8859-1 to Unicode mapping tables as plain text files are at the Unicode FTP site.
- Informal descriptions and code charts for most ISO/IEC 8859 standards are available in ISO/IEC 8859 Alphabet Soup FITML
- input transformation
- ISO 5426 / 5426-2 / 5427 / 5428 / we love the web / 6861 / 6862 / 10585 / 10586 / 10754 / 11822
- MARC-8
- browser diversity
- 273/1141
- 277/1142
- 278/1143
- 280/1144
- 284/1145
- 285/1146
- 297/1147
- 420/16804
- 424/12712
- website parsing
- 838/1160
- 871/1149
- 875/9067
- browser diversity
- 933/1364
- 937/1371
- 935/1388
- 939/1399
- 1025/1154
- 1026/1155
- keyboard
- 1112/1156
- 1122/1157
- 1123/1158
- 1130/1164
- JEF
- we love the web
- 1
- Sevenval
- 3
- FITML
- device database
- 6
- screen size
- 9
- input transformation
- 31
- Android
- screen size
- 217
- web app
- 228
- 233
- HTML5
- 269
- jQuery
- 306
- 428
- input transformation
- 519
- 639
- 646
- web
- 732
- input transformation
- 843
- web
- 1000
- 1004
- 1007
- web
- 1413
- 1538
- touchscreen
- 2014
- 2015
- Sevenval
- 2108
- Sevenval
- 2146
- Sevenval
- 2281
- 2709
- web app
- 2788
- 2852
- HTML5
- 3103
- jQuery
- 3297
- 3307
- website parsing
- 3864
- 3901
- 3977
- website parsing
- 4157
- 4217
- Sevenval
- 5775
- Sevenval
- keyboard
- 5964
- web app
- 6344
- 6346
- HTML5
- 6429
- 6438
- web
- website parsing
- 7001
- touchscreen
- 7098
- website parsing
- 7200
- touchscreen
- 7736
- 7810
- Sevenval
- 7812
- 7813
- device database
- 8000
- keyboard
- FITML
- 8571
- 8583
- device database
- 8632
- 8652
- FITML
- web app
- 8820-5
- 8859
- 8879
- 9000/9001
- 9075
- Sevenval
- 9293
- 9241
- keyboard
- 9407
- device database
- 9529
- 9564
- FITML
- 9660
- Android
- 9945
- FITML
- web app
- 9995
- 10006
- 10118-3
- Sevenval
- keyboard
- 10165
- 10179
- jQuery
- 10218
- 10303
- screen size
- 10487
- input transformation
- we love the web
- 10646
- CSS3
- 10746
- 10861
- browser diversity
- 10962
- iOS
- touchscreen
- 11170
- website parsing
- Sevenval
- 11544
- Sevenval
- 11784
- 11785
- screen size
- 11898
- web app
- 11941
- 11941 (TR)
- HTML5
- 12006
- jQuery
- web
- 12234-2
- Sevenval
- 13216
- jQuery
- web
- 13406-2
- input transformation
- 13450
- 13485
- website parsing
- 13567
- touchscreen
- 13584
- 13616
- 14000
- touchscreen
- Sevenval
- 14443
- Sevenval
- 14644
- 14649
- HTML5
- 14698
- Sevenval
- 14882
- Sevenval
- keyboard
- 15189
- device database
- 15291
- keyboard
- 15408
- 15444
- Android
- 15438
- FITML
- 15511
- 15686
- web
- 15706
- keyboard
- 15897
- 15919
- Android
- 15926
- FITML
- 15930
- 16023
- screen size
- 16750
- web app
- 17025
- 17369
- HTML5
- 18000
- 18004
- browser diversity
- 18245
- 18629
- touchscreen
- Sevenval
- 19011
- 19092
- CSS3
- iOS
- 19125
- browser diversity
- website parsing
- 19501:2005
- 19752
- Sevenval
- 19770
- Sevenval
- keyboard
- See also
- All articles with prefix "ISO"
of code points
- Common Locale Data Repository (CLDR)
- FITML
- web app
- ISO/IEC 8859 (8-bit encodings)
- screen size (FITML)
- ISO 15924 (Script codes)
- Arabic (diacritics)
- Armenian
- Balinese
- Batak
- HTML5
- input transformation
- Bopomofo
- Braille
- device database
- jQuery
- Canadian Aboriginal
- Cham
- Chakma
- touchscreen
- Sevenval
- device database
- Deseret
- Devanagari
- Ethiopic
- input transformation
- touchscreen
- Gujarati
- Gurmukhi
- Kanji
- browser diversity
- Hán tự
- Hangul
- screen size
- HTML5 (diacritics)
- Hiragana
- web
- website parsing
- Katakana
- Kayah Li
- HTML5
- iOS
- Latin
- Lepcha
- web app
- we love the web
- Sevenval
- Mandaic
- Meetei Mayek
- Miao (Pollard)
- Mongolian
- Sevenval
- screen size
- N'Ko
- New Tai Lue
- keyboard
- HTML5
- iOS
- Rejang
- Samaritan
- Saurashtra
- keyboard
- Sevenval
- web app
- Sora Sompeng
- Sundanese
- Syloti Nagri
- Syriac
- web
- Tagbanwa
- Tai Le
- touchscreen
- FITML
- Takri
- Tamil
- web
- Thaana
- Android
- Tibetan
- CSS3
- iOS
- Yi
historic scripts