Skip to content

Latest commit

 

History

History
174 lines (135 loc) · 10.5 KB

opentype-shaping-vedic-extensions.md

File metadata and controls

174 lines (135 loc) · 10.5 KB

Vedic Extensions in OpenType

This document outlines the shaping information needed to display characters from the Unicode Vedic Extensions block, which may be used within text runs in many Indic scripts.

Table of Contents

General information

The Vedic Extensions block encodes letters and marks that are used in a large body of ancient literature written in the Vedic Sanskrit language.

Primarily an oral language in the time period when the key literature originated, Vedic Sanskrit has no native script. Therefore, texts may be typeset in any one of the Indic scripts, using the Vedic Extensions to supplement the main script's character set.

Terminology

Individual Vedic Extension characters may be named by a combination of the Vedic text in which the mark is used, the regional or manuscript tradition involved, or a simple visual or phonetic description of the character. Some commonly used general categories are worth noting.

Udatta is the term for a high tone on a vowel.

Anudatta is the term for a low tone on a vowel.

Svarita is the term for a falling or mixed tone on a vowel.

Anusvara is the term for a nasalization sound that precedes a consonant.

Visarga is the term for a soft breathing sound that precedes a vowel.

Note: In modern Indic languages, the terms anusvara and visarga often refer to diacritical marks that have the above effects on pronunciation. In the Vedic Sanskrit language, however, they are generally considered independent letters.

Glyph classification

For most codepoints, the General Category property defined in the Unicode standard is correct, but it is not sufficient to fully capture the expected shaping behavior (such as how the character is treated during glyph reordering). Therefore, they must additionally be classified by how they are treated when shaping a run of text.

Vedic Extensions character table

Vedic Extension glyphs should be classified as in the following table. Codepoints with no assigned meaning are marked as unassigned in the Unicode category column.

Assigned codepoints marked with a null in the Shaping class column evoke no special behavior from the shaping engine.

The Mark-placement subclass column indicates mark-placement positioning. Assigned codepoints marked with a null in this column evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+1CD0 Mark [Mn] CANTILLATION TOP_POSITION ᳐ Tone Karshana
U+1CD1 Mark [Mn] CANTILLATION TOP_POSITION ᳑ Tone Shara
U+1CD2 Mark [Mn] CANTILLATION TOP_POSITION ᳒ Tone Prenkha
U+1CD3 Punctuation null null ᳓ Sign Nihshvasa
U+1CD4 Mark [Mn] CANTILLATION OVERSTRUCK ᳔ Tone Midline Svarita
U+1CD5 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳕ Tone Aggravated Independent Svarita
U+1CD6 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳖ Tone Independent Svarita
U+1CD7 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳗ Tone Kathaka Independent Svarita
U+1CD8 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳘ Tone Candra Below
U+1CD9 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳙ Tone Kathaka Independent Svarita Schroeder
U+1CDA Mark [Mn] CANTILLATION TOP_POSITION ᳚ Tone Double Svarita
U+1CDB Mark [Mn] CANTILLATION TOP_POSITION ᳛ Tone Triple Svarita
U+1CDC Mark [Mn] CANTILLATION BOTTOM_POSITION ᳜ Tone Kathaka Anudatta
U+1CDD Mark [Mn] CANTILLATION BOTTOM_POSITION ᳝ Tone Dot Below
U+1CDE Mark [Mn] CANTILLATION BOTTOM_POSITION ᳞ Tone Two Dots Below
U+1CDF Mark [Mn] CANTILLATION BOTTOM_POSITION ᳟ Tone Three Dots Below
U+1CE0 Mark [Mn] CANTILLATION TOP_POSITION ᳠ Tone Rigvedic Kashmiri Independent Svarita
U+1CE1 Mark [Mc] CANTILLATION RIGHT_POSITION ᳡ Tone Atharavedic Independent Svarita
U+1CE2 Mark [Mn] AVAGRAHA OVERSTRUCK ᳢ Sign Visarga Svarita
U+1CE3 Mark [Mn] null OVERSTRUCK ᳣ Sign Visarga Udatta
U+1CE4 Mark [Mn] null OVERSTRUCK ᳤ Sign Reversed Visarga Udatta
U+1CE5 Mark [Mn] null OVERSTRUCK ᳥ Sign Visarga Anudatta
U+1CE6 Mark [Mn] null OVERSTRUCK ᳦ Sign Reversed Visarga Anudatta
U+1CE7 Mark [Mn] null OVERSTRUCK ᳧ Sign Visarga Udatta With Tail
U+1CE8 Mark [Mn] AVAGRAHA OVERSTRUCK ᳨ Sign Visarga Anudatta With Tail
U+1CE9 Letter AVAGRAHA null ᳩ Sign Anusvara Antargomukha
U+1CEA Letter null null ᳪ Sign Anusvara Bahirgomukha
U+1CEB Letter null null ᳫ Sign Anusvara Vamagomukha
U+1CEC Letter AVAGRAHA null ᳬ Sign Anusvara Vamagomukha With Tail
U+1CED Mark [Mn] AVAGRAHA BOTTOM_POSITION ᳭ Sign Tiryak
U+1CEE Letter AVAGRAHA null ᳮ Sign Hexiform Long Anusvara
U+1CEF Letter null null ᳯ Sign Long Anusvara
U+1CF0 Letter null null ᳰ Sign Rthang Long Anusvara
U+1CF1 Letter AVAGRAHA null ᳱ Sign Anusvara Ubhayato Mukha
U+1CF2 Letter CONSONANT_DEAD null ᳲ Sign Ardhavisarga
U+1CF3 Letter CONSONANT_DEAD null ᳳ Sign Rotated Ardhavisarga
U+1CF4 Mark [Mn] CANTILLATION TOP_POSITION ᳴ Tone Candra Above
U+1CF5 Letter CONSONANT_WITH_STACKER null ᳵ Sign Jihvamuliya
U+1CF6 Letter CONSONANT_WITH_STACKER null ᳶ Sign Upadhmaniya
U+1CF7 Mark [Mc] null null ᳷ Sign Atikrama
U+1CF8 Mark [Mn] CANTILLATION null ᳸ Tone Ring Above
U+1CF9 Mark [Mn] CANTILLATION null ᳹ Tone Double Ring Above
U+1CFA Letter PLACEHOLDER null ᳺ Sign Double Anusvara Antargomukha
U+1CFB unassigned
U+1CFC unassigned
U+1CFD unassigned
U+1CFE unassigned
U+1CFF unassigned

Shaping information

31 of the characters in the block are categorized as marks. 27 of these marks are subcategorized as non-spacing; the remaining four are spacing-combining.

Of the non-spacing marks, 20 are classified as CANTILLATION (or tone-marker) indicators, which modify the pitch of vowels. Most of these marks are generally positioned above or below the main character, using GPOS mark attachment, in a position that does not interact or interfere with the main character. In Unicode, the CANTILLATION classification is separate from the TONE_MARKER classification used in some scripts for semantic reasons; the two classifications are identical for shaping purposes.

Some of the marks (cantillation and non-cantillation) are classified as OVERSTRUCK in the Mark-placement subclass column. This indicates that the mark is intended to be rendered on top of the preceding character. During reordering, OVERSTRUCK marks are tagged for the ordering position POS_AFTER_MAIN.

Some marks are classified, for shaping purposes, as AVAGRAHA or VISARGA. This indicates that the mark behaves more like the Avagraha or Visarga character than like a diacritic.

Characters that are categorized in Unicode as letters vary with respect to whether or not they trigger special behavior in the shaping process. These include letters that are classified as CONSONANT and letters that are classified as AVAGRAHA.