legacy grapheme clusters vs extended grapheme clusters #1

frivoal · 2015-10-20T13:11:57Z

"Grapheme cluster" is often the appropriate way to define "character" in a specifications (such as CSS) which care about things readers visually identify as a character.

Maybe the spec should point that out, with a link to the relevant part of unicode (http://unicode.org/reports/tr29/ I presume). There is already a mention of that in the "Indexing strings" section, but not in the "Choosing a definition of 'character'" section, where it would be particularly relevant.

Also, providing a specific definition requires picking between "legacy grapheme clusters" and "extended grapheme clusters", and I am not sure how to do that. Guidance on this topic would be appreciated.

r12a · 2015-10-20T13:17:44Z

Good points, Florian. I'll look at adding that information.

We usually recommend extended grapheme clusters only.

frivoal · 2015-10-20T13:23:22Z

That's typically been what I've guessed should be the correct answer, but without really knowing why. And this specification looks like a great place to enlighten people in my situation.

aphillips · 2022-04-05T17:04:45Z

Is this addressed by the introduction to section 4?

merge w3c changes to my branch

xfq · 2023-12-19T07:09:45Z

There is no mention of legacy grapheme clusters in specdev at the moment and I think this paragraph in UAX #29 answers Florian's question:

An extended grapheme cluster is the same as a legacy grapheme cluster, with the addition of some other characters. The continuing characters are extended to include all spacing combining marks, such as the spacing (but dependent) vowel signs in Indic scripts. For example, this includes U+093F ( ि ) DEVANAGARI VOWEL SIGN I. The extended grapheme clusters should be used in implementations in preference to legacy grapheme clusters, because they provide better results for Indic scripts such as Tamil or Devanagari in which editing by orthographic syllable is typically preferred. For scripts such as Thai, Lao, and certain other Southeast Asian scripts, editing by visual unit is typically preferred, so for those scripts the behavior of extended grapheme clusters is similar to (but not identical to) the behavior of legacy grapheme clusters.

IMHO this kind of detail should be mentioned by charmod, not in specdev.

aphillips added a commit that referenced this issue Sep 30, 2022

Merge pull request #1 from w3c/gh-pages

9aef417

merge w3c changes to my branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

legacy grapheme clusters vs extended grapheme clusters #1

legacy grapheme clusters vs extended grapheme clusters #1

frivoal commented Oct 20, 2015

r12a commented Oct 20, 2015

frivoal commented Oct 20, 2015

aphillips commented Apr 5, 2022

xfq commented Dec 19, 2023

legacy grapheme clusters vs extended grapheme clusters #1

legacy grapheme clusters vs extended grapheme clusters #1

Comments

frivoal commented Oct 20, 2015

r12a commented Oct 20, 2015

frivoal commented Oct 20, 2015

aphillips commented Apr 5, 2022

xfq commented Dec 19, 2023