Enable wide Unicode support for names #24

viktor-yakubiv · 2024-01-11T16:34:39Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)
If applicable, I’ve added docs and tests

Description of changes

Enables almost full Unicode support for directive names. This is tricky to test, I've added only Latin, Greek and Cyrillyc characters. Also, I have tested combining accent modifiers at the middle and at the end of directive name.

Attribute names are worth to look too but maybe in a separate PR.

The PR should be ready to review. Thank you!

Closes #23

codecov-commenter · 2024-01-11T16:36:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7f23ba8) 100.00% compared to head (73eb92a) 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #24   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            9         9           
  Lines         1416      1439   +23     
=========================================
+ Hits          1416      1439   +23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wooorm

Some style notes :)

dev/lib/factory-name.js

wooorm · 2024-01-12T10:29:38Z

dev/lib/factory-name.js

-    return self.previous === codes.dash || self.previous === codes.underscore
-      ? nok(code)
-      : ok(code)
+    return allowedEdgeCharacter(self.previous) ? ok(code) : nok(code)


shouldn’t dashes also be edge characters?

If you mean allowed edge characters, it was forbidden by the spec previously. I kept it but I don't mind changing.

Currently, the name cannot either start or end with any punctuation or underscore.

Is this something you suggest to change?

To refresh my memory: So, in name, this last stuff is about what is possible to exit after.
That behavior at the end is very different from whether the first character is allowed to start a name.
Before, there was a very different check compared to the check in start: - and _ were allowed in names but not at the end.
Now they’re the same. I’m not sure if that’s useful? Perhaps the last line should just be return ok(code)?

I have tried return ok(code) at the end. Allowing the name to end with an underscore interferes with emphasis notation. I reverted the commit.

I think, there is no point to allow punctuation in the end but don't allow at the start. If we are going to allow punctuation, it should be (almost) equal.

Possible options:

Leave as is

Allow any punctuation at the start but forbid markdown special chars at the end. Those should include =, ~ (special in some flavours), _, *, parentheses and perhaps some more — basically anything in the ASCII (in comparison to option 3, blacklist under 128).

Allow any punctuation at the start and end if it's beyond ASCII (in comparison to option 2, whitelist above 127)

test/index.js

wooorm · 2024-01-12T10:33:33Z

You could maybe test the math symbol for pi π, and an emoji, such as 🌍?

Interferes with emphasis This reverts commit f4ec634.

viktor-yakubiv

I have incorporated the requested changes.

viktor-yakubiv · 2024-01-16T17:42:17Z

test/index.js

+    assert.equal(micromark('::𝜋∈ℝ', options()), '') // Italic
+    assert.equal(micromark('::𝛑≈3.14', options()), '') // Bold
+    assert.equal(micromark('::𝝅∉ℚ', options()), '') // Bold italic
+    assert.equal(micromark('::𝞹≠3.14', options()), '') // Sans bold italic


@wooorm the letter Pi you have suggested, is a part of the Greek alphabet covered by the test:

should support unicode alphabets in name

In this (an other similar tests), I added characters that have the word mathematical in their names.

wooorm · 2024-01-29T09:11:20Z

Thanks for your continued work. I’ve been wondering the last week what to do about punctuation. And about the implications to the semver version of this package.

In the ASCII range we allow ascii alphanumerics and -, ., _.
I think in the rest of the unicode range we should also only allow alphanumerics.
We should be able to do that by checking classifyCharacter(x) === undefined.
I don’t see unicode “punctuation” tested much, could you see if that changes things?

Some examples of symbols/punctuation outside of the ascii range are € and £!

Enable wide Unicode support for names

b34ad7a

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jan 11, 2024

viktor-yakubiv marked this pull request as draft January 11, 2024 16:37

Add tests for decomposed accent modifiers

ebd55c3

viktor-yakubiv marked this pull request as ready for review January 11, 2024 18:44

wooorm reviewed Jan 12, 2024

View reviewed changes

viktor-yakubiv added 7 commits January 16, 2024 17:10

Replace em dashes with unicode chars

e03860c

Add tests for emojis and math symbols

d9b89e5

Adhere styleguide

c400dab

Allow punctuation at the end of name

f4ec634

Automatic formatting

54f041d

Revert punctuation at the end of name

4785fba

Interferes with emphasis This reverts commit f4ec634.

Automatic format

73eb92a

viktor-yakubiv commented Jan 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable wide Unicode support for names #24

Enable wide Unicode support for names #24

viktor-yakubiv commented Jan 11, 2024 •

edited

codecov-commenter commented Jan 11, 2024 •

edited

wooorm left a comment •

edited

wooorm Jan 12, 2024

viktor-yakubiv Jan 12, 2024

wooorm Jan 12, 2024

viktor-yakubiv Jan 16, 2024

wooorm commented Jan 12, 2024 •

edited

viktor-yakubiv left a comment

viktor-yakubiv Jan 16, 2024

wooorm commented Jan 29, 2024

Enable wide Unicode support for names #24

Are you sure you want to change the base?

Enable wide Unicode support for names #24

Conversation

viktor-yakubiv commented Jan 11, 2024 • edited

Initial checklist

Description of changes

codecov-commenter commented Jan 11, 2024 • edited

Codecov Report

wooorm left a comment • edited

Choose a reason for hiding this comment

wooorm Jan 12, 2024

Choose a reason for hiding this comment

viktor-yakubiv Jan 12, 2024

Choose a reason for hiding this comment

wooorm Jan 12, 2024

Choose a reason for hiding this comment

viktor-yakubiv Jan 16, 2024

Choose a reason for hiding this comment

wooorm commented Jan 12, 2024 • edited

viktor-yakubiv left a comment

Choose a reason for hiding this comment

viktor-yakubiv Jan 16, 2024

Choose a reason for hiding this comment

wooorm commented Jan 29, 2024

viktor-yakubiv commented Jan 11, 2024 •

edited

codecov-commenter commented Jan 11, 2024 •

edited

wooorm left a comment •

edited

wooorm commented Jan 12, 2024 •

edited