Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default_Ignorable_Code_Points should all be zero-width #118

Open
Jules-Bertholet opened this issue Feb 12, 2024 · 1 comment
Open

Default_Ignorable_Code_Points should all be zero-width #118

Jules-Bertholet opened this issue Feb 12, 2024 · 1 comment

Comments

@Jules-Bertholet
Copy link

Jules-Bertholet commented Feb 12, 2024

From https://www.unicode.org/faq/unsup_char.html#3:

All default-ignorable characters should be rendered as completely invisible (and non advancing, i.e. “zero width”), if not explicitly supported in rendering.

However, this library incorrectly considers some of them, for example U+3164 HANGUL FILLER, to have non-zero width.

(There is one exception, where this library is correct in assigning a non-zero width to a Default_Ignorable_Code_Point: U+115F HANGUL CHOSEONG FILLER is meant to be combined with other Hangul jamo to form a width-2 syllable block, so it should be assigned width 2 even though it has no display on its own.)

@jquast
Copy link
Owner

jquast commented Feb 15, 2024

Thanks, I think this is the same as your other issue, that if I am able to distinguish Default_Ignorable_Code_Point values as zero width it should solve for U+3164 HANGUL FILLER, or I can add it manually.

I agree about some jamo are meant to be combined, and this library assumes as such, see test case:

wcwidth/tests/test_core.py

Lines 225 to 244 in 056ee4b

def test_kr_jamo():
"""
Test basic combining of HANGUL CHOSEONG and JUNGSEONG
Example and from Raymond Chen's blog post,
https://devblogs.microsoft.com/oldnewthing/20201009-00/?p=104351
"""
# This is an example where both characters are "wide" when displayed alone.
#
# But JUNGSEONG (vowel) is designed for combination with a CHOSEONG (consonant).
#
# This wcwidth library understands their width only when combination,
# and not by independent display, like other zero-width characters that may
# only combine with an appropriate preceding character.
phrase = (
u"\u1100" # ᄀ HANGUL CHOSEONG KIYEOK (consonant)
u"\u1161" # ᅡ HANGUL JUNGSEONG A (vowel)
)
expect_length_each = (2, 0)
expect_length_phrase = 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants