Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: simplify util.toUSVString #39891

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 1 addition & 2 deletions lib/internal/util.js
Expand Up @@ -57,8 +57,7 @@ const experimentalWarnings = new SafeSet();

const colorRegExp = /\u001b\[\d\d?m/g; // eslint-disable-line no-control-regex

const unpairedSurrogateRe =
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/;
const unpairedSurrogateRe = /\p{Surrogate}/u;
Copy link
Member

@lpinca lpinca Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slower, see #39814 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. It depends on the string you use it with, just like the current solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing that discussion: the performance characteristics are different before/after this change. The trade-off is as follows:

  • For strings without surrogates (the common case?), performance is equivalent.
  • For strings with a lone trail surrogate at the start, the patch performs better.
  • For strings with a lone lead surrogate at the end, the old/current version performs better.

Copy link
Member

@lpinca lpinca Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be pedandic:

  • For strings without surrogates (the common case?), performance is equivalent.
  • For strings with a lone trail surrogate at the start, the patch performs sligtly better (~5%).
  • For strings with a lone lead surrogate at the end, the old/current version performs much better (~50%).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm good with either approach but I'm happy to keep things as they are if folks think that {Surrogate} way is too slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you stick to the current approach, then consider renaming unpairedSurrogateRe — the current name is misleading, since it doesn’t just match unpaired surrogates. E.g. in the case of '\u{10000}\uDC00' (i.e. a surrogate pair followed by a lone surrogate), it matches '\uDC00\uDC00' (i.e. the trail surrogate of the pair + the lone surrogate) as opposed to just the lone surrogate '\uDC00' — which is fine for this specific use case, but could become problematic if someone were to re-use this pattern elsewhere. For example, in a user-land implementation:

function toUSVString(str) {
  // This works fine:
  return str.replaceAll(/\p{Surrogate}/gu, '\uFFFD');
  // …but it subtly wouldn’t work correctly with the current `unpairedSurrogateRe`.
}

function toUSVString(val) {
const str = `${val}`;
// As of V8 5.5, `str.search()` (and `unpairedSurrogateRe[@@search]()`) are
Expand Down
12 changes: 11 additions & 1 deletion test/parallel/test-util.js
Expand Up @@ -148,7 +148,17 @@ assert.strictEqual(util.isFunction(function() {}), true);
assert.strictEqual(util.isFunction(), false);
assert.strictEqual(util.isFunction('string'), false);

assert.strictEqual(util.toUSVString('string\ud801'), 'string\ufffd');
// Lead surrogates: D800..DBFF
assert.strictEqual(util.toUSVString('string\ud800'), 'string\ufffd');
assert.strictEqual(util.toUSVString('string\udabc'), 'string\ufffd');
assert.strictEqual(util.toUSVString('string\udbff'), 'string\ufffd');
// Trail surrogates: DC00..DFFF
assert.strictEqual(util.toUSVString('string\udc00'), 'string\ufffd');
assert.strictEqual(util.toUSVString('string\ude12'), 'string\ufffd');
assert.strictEqual(util.toUSVString('string\udfff'), 'string\ufffd');
// Verify surrogate pairs are unaffected.
assert.strictEqual(util.toUSVString('string\ud800\udc00'),
'string\ud800\udc00');

{
assert.strictEqual(util.types.isNativeError(new Error()), true);
Expand Down