Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: improve unicode support #31319

Closed

Conversation

BridgeAR
Copy link
Member

The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).

I had to move some functions. Otherwise we'd have to load the utils functions by default and that did not seem necessary.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added readline Issues and PRs related to the built-in readline module. util Issues and PRs related to the built-in util module. labels Jan 11, 2020
@BridgeAR BridgeAR force-pushed the 2020-01-11-util-better-unicode-support branch from 81735c6 to 02515e8 Compare January 11, 2020 22:36
@nodejs-github-bot

This comment has been minimized.

@BridgeAR BridgeAR force-pushed the 2020-01-11-util-better-unicode-support branch from 02515e8 to 80fe23b Compare January 12, 2020 02:30
@nodejs-github-bot

This comment has been minimized.

@BridgeAR BridgeAR force-pushed the 2020-01-11-util-better-unicode-support branch from da70f98 to fc7f090 Compare January 12, 2020 23:38
@Trott
Copy link
Member

Trott commented Jan 13, 2020

I'm not sure who to ping for a review. @srl295 maybe?

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could optimize the existing code, but generally LGTM.

const isFullWidthCodePoint = (code) => {
// Code points are partially derived from:
// http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
return code >= 0x1100 && (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this might be doable as a regex… it could be compiled as a regex, i don't think there's an East Asian Width property available in regex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could definitely be a regular expression. I guess it's slower that way but I did not check. I'll have a look soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICU4C also has API to get the East Asian Width.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use that in case Node.js is build with ICU but this is the fallback code.

The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).
@BridgeAR BridgeAR force-pushed the 2020-01-11-util-better-unicode-support branch from fc7f090 to 1b9547f Compare January 17, 2020 08:51
@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 17, 2020
@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member

Trott commented Jan 18, 2020

Relevant test failures in the no-intl host on CI?

@Trott Trott removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 18, 2020
@nodejs-github-bot
Copy link
Collaborator

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 20, 2020
BridgeAR added a commit that referenced this pull request Jan 22, 2020
The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).

PR-URL: #31319
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
@BridgeAR
Copy link
Member Author

Landed in 8fb5fe2 🎉

@BridgeAR BridgeAR closed this Jan 22, 2020
codebytere pushed a commit that referenced this pull request Feb 17, 2020
The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).

PR-URL: #31319
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
@codebytere codebytere mentioned this pull request Feb 17, 2020
@codebytere
Copy link
Member

@BridgeAR if this should go back to v12.x it'll need a manual backport, but feel free to update the label if it shouldn't land!

@targos targos removed backport-requested-v12.x author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Apr 25, 2020
targos pushed a commit to targos/node that referenced this pull request Apr 25, 2020
The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).

PR-URL: nodejs#31319
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
targos pushed a commit that referenced this pull request Apr 28, 2020
The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).

PR-URL: #31319
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
readline Issues and PRs related to the built-in readline module. util Issues and PRs related to the built-in util module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants