Language and Script Identifiers

In order to effectively share digital data, Syriaca.org has made a commitment to using international standards. In the case of indicating the languages and scripts of data, Syriaca.org relies on identifier codes published by the International Organization for Standardization (ISO) for the representation of language names and scripts. Syriaca.org's usage is based on the Internet Engineering Task Force's (IETF) "Best Current Practices Memo 47" as well as guidance by the World Wide Web Consortium (W3C). Because there are some inconsistencies between these codes and scholarly usage, Syriaca.org is actively working to bring the ISO standards and scholarly best practices into agreement. Contributors to Syriaca.org are requested to follow the guidelines below.

Language Codes - ISO 639

ISO 639 provides identifier codes of 2 or 3 letters for identifying languages in computer applications, such as databases or internet browsers (For example "mal" is the code for Malayalam and "en" is the code for English). These language codes are defined in four lists maintained by the Library of Congress (LOC): ISO 639-1, ISO 639-2, ISO 639-5 and the Summer Institute of Linguistics (SIL): ISO 693-3.

Syriaca.org contributors should use the following guidelines when encoding languages in XML:

  • The ISO 693-3 standard offers two codes which could be used to refer to the Syriac language: "syr" (labelled as "Syriac", a "macrolanguage") and "syc" (labelled as "classical Syriac"). At present, these codes and their relationship to each other are not adequately defined in the ISO standard. To avoid this confusion, Syriaca.org uses the "syr" code as a default description for all "literary Syriac" (ܟܬܒܢܝܐ).
  • In general, Syriaca.org does not use the "syc" identifier at all in its own publications due to the chronological difficulties of defining “classical” Syriac. In the cases where Syriaca.org has included data from other sources that do use the "syc" code, Syriaca.org classifies "syc" as an individual language included as one constituent within the "syr" macrolanguage [Note: Our application to the ISO regristrar to offically recognize this usage is pending].
  • For commonly occurring languages, contributors should use the following codes:
    • "syr" = Syriac of any variety or period
    • "ar" = Arabic of any variety or period
    • "cop" = Coptic of any variety or period
    • "en" = English of any variety or period
    • "fr" = French of any variety or period
    • "de" = German of any variety or period
    • "grc" = Ancient Greek to A.D. 1453
    • "el" = Modern Greek after A.D. 1453
    • "la" = Latin of any variety or period
    • "mal" = Malayalam of any variety or period
  • For cases not covered by the list above, contributors should refer to the standards directly on the LOC and SIL websites and consult with a Syriaca.org editor.
  • When both a two-letter and three-letter ISO language code exists (such as "en" and "eng" for English), the preferred code is the one listed in the IANA Subtag Registry. In practice, this means that the two-letter code is always preferred.

Script Codes - ISO 15924

ISO 639 provides identifier codes of 4 letters for identifying scripts in computer applications, such as databases or internet browsers (For example "Cyrl" is the code for the Cyrillic script and "Palm" is the code for the Palmyrene script). These script codes are defined in one list maintained by The Unicode Consortium: ISO 15924.

Syriaca.org contributors should use the following guidelines when encoding scripts in XML:

  • The ISO 15924 standard offers four script codes specific to Syriac: "Syrc" (unlabelled), "Syre" (labelled as Estrangelo), "Syrj" (labelled as Western), and "Syrn" (labelled as Eastern). Given the pejoritive origins of the Syrj and Syrn codes, Syriaca.org is preparing an application to the ISO regristrar to change these codes but contributors should continue using these tags in the interim. In addition, Syriaca.org is also preparing an application for the creation of additional Syriac script codes for currently unrepresented scripts such as Melkite script or Syro-Malabar script.
  • Because Syriac scripts have been historically used with multiple languages, Syriaca.org recommends that contributors avoid confusion by always including an ISO language code whenever a script code is used. ISO standards call for the combination to be expressed in the following sequence: "[language]-[script]". Thus Syriac written in Estrangela script would be encoded as "syr-Syre". This system allows one to indicate the usage of Syriac script in Garshunography (the writing of languages other than Syriac using the Syriac script) as well as to indicate when Syriac data is being represented in transcription.
  • For commonly occuring script and language combinations, contributors should use the following codes:
    • "syr" = default code for unvocalized Syriac of any variety or period, where script is not specified. [Note: Syriaca.org considers "syr" and "syr-syrc" to have equivalent meaning and prefers the shorter.]
    • "syr-Syre" = Syriac in Estrangela script
    • "syr-Syrj" = Syriac in vocalized West Syriac script
    • "syr-Syrn" = Syriac in vocalized East Syriac script
    • "syr-x-syrm" = Syriac in Melkite script [Note: This is not an ISO code but a private use code for Melkite employed by Syriaca.org until an ISO code is created]
    • "ar-Syrc" = Arabic Garshuni in unvocalized or undetermined Syriac script
    • "ar-Syrj" = Arabic Garshuni in vocalized West Syriac script
    • "ar-Syrn" = Arabic Garshuni in vocalized East Syriac script
    • "mal-Syrn" = Malayalam Garshuni in vocalized East Syriac script
  • At present, the ISO script codes do not allow a specific way to indicate if a script is vocalized or unvocalized. As a general rule, contributors are encouraged only to use the codes for West and East Syriac scripts with vocalized or partially vocalized data.
  • In some cases of Syriac in transcription or Romanization, it is a matter of subjective judgement as to whether the final data should be considered to be in the Syriac language or in the target language of the transcription. For example, in the case of the transcription system of The Gorgias Encyclopedic Dictionary of the Syriac Heritage, Syriaca.org considers the GEDSH transcriptions to be linguistically English, such as in the case of the name "Isḥaq of Nineveh". This name is marked as "en-x-gedsh" to indicate that it is English in the GEDSH transliteration script.
  • For scripts not covered by the list above, contributors should refer to the standards directly on the Unicode Consortium website and consult with a Syriaca.org editor.

Feedback?