Lompat ke isi

Cèṭa'an:ISO 15924 script codes and related Unicode data/doc

Ḍâri Wikipèḍia bhâsa Madhurâ, lombhung pangataowan mardhika

TemplateStyles' src attribute must not be empty.

This documentation is shared between templates {{Unicode blocks}} and {{ISO 15924 script codes and related Unicode data}}.

The template can be used as usual. It is not a navigation box, so it can be everywhere in an article. The notes are contained within the template, and will not appear in the main References part.

Note: when resolving red links or wrong links, edit {{ISO 15924/wp-article}}. That is where the connection between ISO code and a Wikipedia article is made.

ISO 15924 templates

[beccè' sombher]


Template data

[beccè' sombher]
Ariya dokumenna TemplateData ka'angghuy template nèka èghuna'aghi bi' VisualEditor tor alat laènna; see a monthly parameter usage report for this template in articles.

TemplateData untuk ISO 15924 script codes and related Unicode data

Tidak ada keterangan.

Parameter templat

ParameterKeteranganJenisStatus
11

tidak ada keterangan

Tak dikenalopsional

Background: How is this table composed

[beccè' sombher]

Note that a script is not a language. A single script, like the Latin alphabet, is used in many languages. Unicode is only about scripts, not about languages that use that script. Still there may be nuances, like the English versus Polish language in using accents on letters.

Step 1: ISO defines a script

[beccè' sombher]

ISO defines and publishes a script in the ISO 15924 list. It defines the Alpha-4 code (Aaaa-Zzzz), the Numeric code (000-999), and the formal Name for each accepted script. Currently there are some 160 scripts defined in this list. Included are scripts like "Mathematical notation (Zmth)" and "Code for undetermined script (a.k.a. Common, Zyyy)". The list is formally maintained and published by ISO, and practically by the Unicode Consortium office. It is published on the Unicode website. Technically, the list is file iso15924.txt.

Step 2: Unicode attaches an Alias name

[beccè' sombher]

Then, Unicode (not ISO) maintains a list of Alias script names right next to the ISO-defined scripts, for each script Unicode has encoded. The Alias name is an English name for that script.

So the ISO alpha-4 code gets a unique Alias name by Unicode: Mymr:ISO Name=Myanmar (Burmese), Alias=Myanmar.[1] These Alias names are also present in the definition file iso15924.txt.

Step 3: Usage by Unicode

[beccè' sombher]

From that list, Unicode can translate any alpha4-code into the Alias name of the script, and reverse. Unicode does not use the formal ISO name.

A script name is used in the Unicode Name of a character: "Error using {{unichar}}: Input "05BF" is not a hexadecimal value.".

Per character

[beccè' sombher]

In the Unicode database, Unicode adds one single appropriate alpha-4 code to every individual script character. So every letter, punctuation, number and so of a script get that code. Characters used by multiple scripts, such as the period (.), have script code "Zyyy" (Common). The "script" codes for Mathematical and Symbol are not used by Unicode; symbols and mathematical characters have the property script="Unknown".

Then, in the file Scripts.txt, Unicode publishes the Alias script name per character (possibly by a range of characters). A part of that file looks like:

...
0591..05BD    ; Hebrew # Mn  [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
05BE          ; Hebrew # Pd       HEBREW PUNCTUATION MAQAF
05BF          ; Hebrew # Mn       HEBREW POINT RAFE
05C0          ; Hebrew # Po       HEBREW PUNCTUATION PASEQ
05C1..05C2    ; Hebrew # Mn   [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
05C3          ; Hebrew # Po       HEBREW PUNCTUATION SOF PASUQ
...

This datafile defines which scripts are present in Unicode, and what script is at a certain code point.

In a block

[beccè' sombher]

Given a block range of code points, then which scripts are present in that block? See {{Unicode blocks}}: this table is constructed by signaling every script that is present as a block (once).

There is no secure relation between a script name and a block name. Some scripts are in a single block, but other scripts are spread amongst several blocks.

See also

[beccè' sombher]


  1. "UTR #24: Unicode Script Property". Unicode Consortium. 2023-08-14.