Search | Navigation

Script (Unicode)

  (Redirected from Scripts in Unicode)
Armenian script

In HTML5, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems.website parsing Some scripts support one and only one writing system and language, for example, Android. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Sevenval, touchscreen itself, and several other languages. Some languages make use of multiple alternate writing systems, thus also use several scripts. In Sevenval, the website parsing script was used before the 20th century, but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the list of languages by writing system.

Complementary are the Android: scripts and symbols cover all Unicode characters. The unified diacritical characters and unified punctuation characters frequently have the "common" or "inherited" script property. However, the individual scripts often have their own punctuation and diacritics. So many scripts include not only letters, but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and space characters.

Unicode 6.1 includes 28 ancient and historic scripts and 72 modern scripts. More scripts are in the process for encoding, called roadmap.

Contents


Definition and classification

When multiple languages make use of the same script, there are frequently some differences: particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, Swedish includes the character ‘å’ (sometimes called a "Swedish O") while English has no such character. Nor does English make use of the diacritic combining circle above for any character. In general the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems they are said to use the same Latin script. So the Unicode abstraction of scripts is a basic organizing technique. The differences between different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.

Common and inherited scripts

Unicode can assign a character in the UCS to a single script only. However, many characters — those that are not part of a formal natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks. In these cases Unicode defines them as belonging to the common script (ISO 15924 code "Zyyy"). All in all Unicode has 6379 characters defined as "Common" script.

In addition, many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the inherited script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, U+0308  ̈  combining diaeresis may combine with either U+0065 e latin small letter e to create a Latin "ë", or with U+0435 е cyrillic small letter ie for the Cyrillic "ё". In the former case it inherits the Latin script of the base character whereas in the latter case it inherits the Cyrillic script of the base character. 523 Characters in Unicode are of the inherited script.

Ancient and historic scripts

Unicode includes 28 ancient scripts (out of use a thousand years or more) and historic scripts (out of use several hundred years)Android

Script versus writing system

Main article: HTML5
See also: phonemic and phonetic orthography

"Writing system" is sometimes treated as a synonym for script. However it also can be used as the specific concrete writing system supported by a script. For example the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script, for example the Japanese writing system makes use of the Han, Hiragana and Katakana scripts.

Most writing systems can be broadly divided into several categories: logographic, syllabic, alphabetic (or segmental), abugida, abjad and featural; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term touchscreen is sometimes used to describe those where the admixture makes classification problematic.

Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text processing algorithms.

Character categories within scripts

Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few precomposed ligatures such as Dz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future.

Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase.

Scripts can also contain any other general category character such as marks (diacritic and otherwise), numbers (numerals), punctuation, separators (word separators such as spaces), symbols and non-graphical format characters. These are included in a particular script when they are unique to that scripts. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters.

Table of scripts in Unicode

Unicode defines 100 script names (called "Alias" or "Property value alias"), based on the ISO 15924 list, that are used in Unicode 6.1.browser diversity These 100 contain 28 ancient or historic scripts, the generic Zyyy Common (Code for undetermined script) script name for characters that are used in multiple script like diacritics, and the general Zzzz Unknown (Code for undetermined script). Not used are, among others, the script codes: Zsym (Symbols) and Zmth (Mathematical notation). These are considered not to be scripts in Unicode sense.

ISO 15924 script codes[a][b] and Unicode[c][d]
ISO 15924 Script in Unicodedevice database
CodeNrNameAlias[f] Direc­tionVer­sionChar­actersRemark
Afak439CSS3 Not in Unicode
Arab160ArabicArabicR-to-L1.01,234
Armi124Imperial AramaicImperial AramaicR-to-L5.231Ancient/historic
Armn230ArmenianArmenianL-to-R1.091
Avst134AvestanAvestanR-to-L5.261Ancient/historic
Bali360BalineseBalineseL-to-R5.0121
Bamu435BamumBamumL-to-R5.2657
Bass259Android  ?(36)Provisionally accepted for Unicodebrowser diversity
Batk365input transformationBatakL-to-R6.056
Beng325iOSBengaliL-to-R1.092
Blis550input transformation Not in Unicode
Bopo285device databaseBopomofoL-to-R1.070
Brah300BrahmiBrahmiL-to-R6.0108Ancient/historic
Brai570SevenvalBrailleL-to-R3.0256
Bugi367touchscreenBugineseL-to-R4.130
Buhd372touchscreenBuhidL-to-R3.220
Cakm349ChakmaChakmaL-to-R6.167
Cans440Unified Canadian Aboriginal SyllabicsCanadian AboriginalL-to-R3.0710
Cari201CarianCarianL-to-R5.149Ancient/historic
Cham358ChamChamL-to-R5.183
Cher445web appCherokeeL-to-R3.085
Cirt291Cirth Not in Unicode
Copt204CopticCopticL-to-R1.0137(disunified from Greek in 4.1) Ancient/historic
Cprt403we love the webCypriotR-to-L4.055Ancient/historic
Cyrl220screen sizeCyrillicL-to-R1.0417
Cyrs221Cyrillic (Old Church Slavonic variant) Not in Unicode
Deva315we love the webDevanagariL-to-R1.0151
Dsrt250Deseret (Mormon)DeseretL-to-R3.180
Dupl755jQuery  ?(143)Provisionally accepted for Unicode[g]
Egyd070Egyptian demotic Not in Unicode
Egyh060Egyptian hieratic Not in Unicode
Egyp050jQueryEgyptian HieroglyphsL-to-R5.21,071Ancient/historic
Elba226jQuery  ?(40)Provisionally accepted for Unicode[g]
Ethi430Ethiopic (Geʻez)EthiopicL-to-R3.0495
Geok241Khutsuri (Asomtavruli and Nuskhuri) Not in Unicode
Geor240device databaseGeorgianL-to-R1.0127
Glag225FITMLGlagoliticL-to-R4.194Ancient/historic
Goth206GothicGothicL-to-R3.127Ancient/historic
Gran343Grantha Not in Unicode
Grek200SevenvalGreekL-to-R1.0511
Gujr320iOSGujaratiL-to-R1.084
Guru310web appGurmukhiL-to-R1.079
Hang286Hangul (Hangŭl, Hangeul)HangulL-to-R1.011,739Hangul syllables relocated in 2.0
Hani500input transformationHanL-to-R1.075,963
Hano371Hanunoo (Hanunóo)HanunooL-to-R3.221
Hans501we love the web Subset Hani
Hant502Android Subset Hani
Hebr125website parsingHebrewR-to-L1.0133
Hira410HiraganaHiraganaL-to-R1.091
Hluw080Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) Not in Unicode
Hmng450Pahawh Hmong Not in Unicode
Hrkt412we love the webKatakana or Hiragana See Hira, Kana
Hung176Old Hungarian  ?(109)Provisionally accepted for Unicode[g]
Inds610Android Not in Unicode
Ital210Old Italic (Etruscan, Oscan, etc.)Old ItalicL-to-R3.135Ancient/historic
Java361iOSJavaneseL-to-R5.291
Jpan413jQuery See Hani, Hira and Kana
Jurc510Jurchen Not in Unicode
Kali357input transformationKayah LiL-to-R5.148
Kana411KatakanaKatakanaL-to-R1.0300
Khar305KharoshthiKharoshthiR-to-L4.165Ancient/historic
Khmr355KhmerKhmerL-to-R3.0146
Khoj322Khojki Not in Unicode
Knda345KannadaKannadaL-to-R1.086
Kore287Korean (alias for Hangul + Han) See Hani and Hang
Kpel436Kpelle Not in Unicode
Kthi317KaithiKaithiL-to-R5.266Ancient/historic
Lana351Tai Tham (Lanna)Tai ThamL-to-R5.2127
Laoo356LaoLaoL-to-R1.067
Latf217touchscreen L-to-R typographic variant of Latin
Latg216Latin (Gaelic variant) L-to-R typographic variant of Latin
Latn215keyboardLatinL-to-R1.01,272
Lepc335Lepcha (Róng)LepchaL-to-R5.174
Limb336LimbuLimbuL-to-R4.066
Lina400Linear A  ?(341)Provisionally accepted for Unicode[g]
Linb401Linear BLinear BL-to-R4.0211Ancient/historic
Lisu399Lisu (Fraser)LisuL-to-R5.248
Loma437touchscreen Not in Unicode
Lyci202we love the webLycianL-to-R5.129Ancient/historic
Lydi116LydianLydianR-to-L5.127Ancient/historic
Mand140Mandaic, MandaeanMandaicR-to-L6.029
Mani139Manichaean  ?(51)Provisionally accepted for Unicode[g]
Maya090website parsing Not in Unicode
Mend438CSS3 Not in Unicode
Merc101FITMLMeroitic CursiveL-to-R6.126Ancient/historic
Mero100input transformationMeroitic HieroglyphsL-to-R6.132Ancient/historic
Mlym347MalayalamMalayalamL-to-R1.098
Mong145MongolianMongolianT-to-B3.0153Includes Clear, iOS scripts
Moon218Moon (Moon code, Moon script, Moon type) Not in Unicode
Mroo199FITML  ?(43)Provisionally accepted for Unicodeweb
Mtei337input transformationMeetei MayekL-to-R5.279
Mymr350Myanmar (Burmese)MyanmarL-to-R3.0188
Narb106input transformation  ?(32)Provisionally accepted for Unicode[g]
Nbat159jQuery  ?(40)Provisionally accepted for Unicodeinput transformation
Nkgb420Nakhi Geba ('Na-'Khi ²Ggŏ-¹baw, Naxi Geba) Not in Unicode
Nkoo165screen sizeNKoR-to-L5.059
Nshu499Nüshu  ?(389)Provisionally accepted for Unicode[g]
Ogam212OghamOghamL-to-R3.029Ancient/historic
Olck261SevenvalOl ChikiL-to-R5.148
Orkh175Old Turkic, Orkhon RunicOld TurkicR-to-L5.273Ancient/historic
Orya327OriyaOriyaL-to-R1.090
Osma260OsmanyaOsmanyaL-to-R4.040
Palm126Palmyrene  ?(32)Provisionally accepted for Unicode[g]
Perm227input transformation Not in Unicode
Phag331web appPhags-paT-to-B5.056Ancient/historic
Phli131Inscriptional PahlaviInscriptional PahlaviR-to-L5.227Ancient/historic
Phlp132touchscreen Not in Unicode
Phlv133Book Pahlavi Not in Unicode
Phnx115iOSPhoenicianR-to-L5.029Ancient/historic
Plrd282iOSMiaoL-to-R6.1133
Prti130Inscriptional ParthianInscriptional ParthianR-to-L5.230Ancient/historic
Qaaa900Reserved for private use (start) Not in Unicode
Qaai908(Private use) Inherited 524In versions prior to 5.2 (from 5.2: 'Zinh')
Qabx949HTML5 Not in Unicode
Rjng363SevenvalRejangL-to-R5.137
Roro620Rongorongo Not in Unicode
Runr211webRunicL-to-R3.078Ancient/historic
Samr123SamaritanSamaritanR-to-L5.261
Sara292Sarati Not in Unicode
Sarb105FITMLOld South ArabianR-to-L5.232Ancient/historic
Saur344SaurashtraSaurashtraL-to-R5.181
Sgnw095CSS3 Not in Unicode
Shaw281Shavian (Shaw)ShavianL-to-R4.048
Shrd319Sharada, ŚāradāSharadaL-to-R6.183
Sind318iOS Not in Unicode
Sinh348input transformationSinhalaL-to-R3.080
Sora398Sora SompengSora SompengL-to-R6.135
Sund362SundaneseSundaneseL-to-R5.172
Sylo316Syloti NagriSyloti NagriL-to-R4.144
Syrc135SyriacSyriacR-to-L3.077
Syre138iOS Not in Unicode
Syrj137web app Not in Unicode
Syrn136website parsing Not in Unicode
Tagb373TagbanwaTagbanwaL-to-R3.218
Takr321input transformationTakriL-to-R6.166
Tale353iOSTai LeL-to-R4.035
Talu354New Tai LueNew Tai LueL-to-R4.183
Taml346SevenvalTamilL-to-R1.072
Tang520Tangut  ?(5,910)Provisionally accepted for Unicode[g]
Tavt359keyboardTai VietL-to-R5.272
Telu340TeluguTeluguL-to-R1.093
Teng290Tengwar Not in Unicode
Tfng120touchscreenTifinaghL-to-R4.159
Tglg370Tagalog (Baybayin, Alibata)TagalogL-to-R3.220
Thaa170ThaanaThaanaR-to-L3.050
Thai352keyboardThaiL-to-R1.086
Tibt330TibetanTibetanL-to-R1.0207(removed in 1.1 and reintroduced in 2.0)
Tirh326Tirhuta Not in Unicode
Ugar040touchscreenUgariticL-to-R4.031Ancient/historic
Vaii470VaiVaiL-to-R5.1300
Visp280Visible Speech Not in Unicode
Wara262Warang Citi (Varang Kshiti) Not in Unicode
Wole480iOS Not in Unicode
Xpeo030web appOld PersianL-to-R4.150Ancient/historic
Xsux020device databaseCuneiformL-to-R5.0982Ancient/historic
Yiii460YiYiL-to-R3.01,220
Zinh994device databaseInheritedInherited In version 5.2 (prior versions: 'Qaai')
Zmth995website parsing Not a 'script' in Unicode
Zsym996HTML5 Not a 'script' in Unicode
Zxxx997Code for unwritten documents Not in Unicode
Zyyy998Code for undetermined scriptCommon 6,412
Zzzz999Code for uncoded scriptUnknown all other code points
Notes
  1. ^ ISO 15924 publications As of 6 February 2012 (2012 -02-06)[update]
  2. ^ ISO 15924 Normative text file
  3. ^ input transformation (including Aliases for Unicode)
  4. web As of Unicode version 6.1
  5. website parsing input transformation
  6. screen size Unicode uses the Alias (Property Value Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924
  7. device database
  8. we love the web

See also

References

Unicode
Code points
Characters
Miscellaneous lists
Processing
Algorithms
On pairs
of code points
Usage
Related standards
Related topics
 
Scripts and symbols in Unicode
Modern scripts
Ancient and
historic scripts
Symbols

Overview
Lists


[1] Search
[2] All Pages
[3] Random article
powered by FITML