- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 276
Escape function returns regex that is incompatible with 'u' flag #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Annex B.1.4 does not apply to Unicode RegExps per https://tc39.github.io/ecma262/#sec-regular-expressions-patterns:
AFAICT, the relevant spec bit is here: https://tc39.github.io/ecma262/#sec-forbidden-extensions
Where is |
I'm probably wrong about the section then, since that's just the section that was referenced in the commit where I thought that functionality was removed. (After a closer look, I think it was just a refactor that touched the line.) But V8 does very specifically disallow the https://chromium.googlesource.com/v8/v8.git/+/e918c4ec464456a374098049ca22eac2107f6223 specifically, the line: // With /u, no identity escapes except for syntax characters
// are allowed. Otherwise, all identity escapes are allowed.
if (unicode()) {
return ReportError(CStrVector("Invalid escape"));
} |
Maybe this is disallowed in 21.2.1 by https://tc39.github.io/ecma262/#prod-IdentityEscape ? AFAICT, the bit that would have allowed |
You’re absolutely right — https://tc39.github.io/ecma262/#prod-`IdentityEscape` disallows it. (I somehow misread BTW the change that introduced this behavior happened back when the spec was still a Word document, so there’s no matching commit 🤦♂️ Here’s some background: https://bugs.ecmascript.org/show_bug.cgi?id=3157#c9 |
@raydog What change would you propose here? I'm not sure how to best address this, since not escaping whitespace would mean the resulting string can't be safely embedded in an XRegExp regex that uses flag ( |
@slevithan Perhaps, similar to some of the other functions in the library, this function can accept a second Alternatively, a second method can be added for escaping unicode regexps... I think I like the first option more tho. |
Just an update on this ticket. This is still failing on the current Node.js LTS (14). I ran into this while converting a Perl script (which used |
@jameschensmith This is now fixed in e22a52b and will be in the next release (v5.0.0 -- no timeline yet). The fix should work in all cases (including with flags |
Changes include: * BREAKING: Handle ES2018 capture names: #247 * BREAKING: Enable `namespacing` feature by default: #316 * BREAKING: Remove Unicode Blocks addon: 4860122 * restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261 * XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7 * Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476 * Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead * XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90 * XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b To generate this commit, I adapted the steps at #205 (comment) Here's a fuller list of changes that can be needed with new releases: > * Version number > * Update version number and year in headers, config files, README. > * Update version number in `XRegExp.version`. > * Publish > * Publish new git tag. E.g.: > * `git tag -a v3.1.0 -m "Release 3.1.0"`. > * `git push origin v3.1.0`. > * `npm publish`.
Changes include: * BREAKING: Handle ES2018 capture names: #247 * BREAKING: Enable `namespacing` feature by default: #316 * BREAKING: Remove Unicode Blocks addon: 4860122 * restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261 * XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7 * Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476 * Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead * XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90 * XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b To generate this commit, I adapted the steps at #205 (comment) Here's a fuller list of changes that can be needed with new releases: > * Version number > * Update version number and year in headers, config files, README. > * Update version number in `XRegExp.version`. > * Publish > * Publish new git tag. E.g.: > * `git tag -a v3.1.0 -m "Release 3.1.0"`. > * `git push origin v3.1.0`. > * `npm publish`.
@raydog and @jameschensmith, I just released v5.0.0, so you should be able to try it and confirm things are now working. |
The XRegExp.escape() method documentation currently says:
However, the escape method emits a bad pattern for unicode regexps. This is because whitespace is currently being escaped (Ie:
" "
becomes"\ "
) and unicode regexps don't support that. This currently fails in Node 6 / 8, and AFAICT (since the documentation is fairly dense...) ES2015 Annex B.1.4 seems to say that the regex 'u' flag should disable the \{any character} -> [{any character}] behaviors that this whitespace escaping thing was depending on.So this code:
Will throw this error:
So either the documentation should be updated, or .escape() should stop escaping whitespace. (Maybe only escape it / stop escaping it with a flag?)
The text was updated successfully, but these errors were encountered: