- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING: Handle ES2018 capture names #247
Conversation
Note that this test also currently fails on master in Chrome 65, as shown by running |
Do you want this merged as is? |
No, as it would cause future pull requests to fail. I had made a little progress on this, but forgot to write it down. I'm trying to see what I can remember. |
Ok, so here's a basic test file you can run with e.g. const XRegExp = require('./');
const re = XRegExp('(?<日本語>)');
console.log(re) If you put a breakpoint at the end of the XRegExp constructor: Lines 629 to 634 in f6e32c6
You can see that EDIT: I dug up some more info: |
I haven't looked into this, but I'm guessing Node 10 added ES 2018's native support for named capture, and the range of chars supported in native capture names is broader than what XRegExp currently allows. There are least three potential paths forward:
I’m withholding opinion at this point on which path is best. (Possibly incomplete) steps for implementing option 3:
A (possibly incomplete) solution for implementing option 1 would be to add a new regex syntax token matching |
Matching ES2018 as closely as possible would be my preference. As a compromise, we could support
Besides, using escape sequences in identifiers is an edge case anyway. We only supported it in named capture groups for consistency. |
I like @mathiasbynens proposal! Since we're already using babel-plugin-transform-xregexp (introduced in #183), I'm thinking it should be possible to use I've pushed a commit with this change, xregexp/tools/scripts/property-regex.js Lines 7 to 20 in f6e32c6
|
A coupe notes on this if it goes forward:
|
Thanks, I decided to briefly take a look, but didn't make much progress.
Got it, this is a breaking change in the API.
Right, I think we'll get the "spelling out" for free via https://github.com/josephfrazier/babel-plugin-transform-xregexp, as described in my previous comment. |
Oh yeah, thanks for the reminder. It looks like we can support older runtimes with the |
@josephfrazier, sorry, I deleted my comment before you replied since I wanted to look more into whether it was an issue. (Anywho, note that I think you're awesome and should feel free to make tough calls on runtime/browser compatibility and whatever else you think is important.) |
Thanks, I think I've gotten the tests updated to reflect the breaking changes. Gonna consider adding a new test that uses a character not previously supported, and think about which src/ code can be deleted if we're not allowing named captures/backreferences to be numbers. |
If this change goes through, it's worth considering whether the release of it should be bundled with turning the There would be an easy workaround of running |
@josephfrazier, can this be squashed to make review easier? |
Sure, do you just want it to be one commit?
Oh, that's an interesting point. One argument is that if we're making a decently impactful breaking change anyway, may as well make another closely related one at the same time to get it over with. On the other hand, that change only takes away functionality, whereas this one adds some (using non-Roman characters in capture names). |
785221e
to
e34bf20
Compare
Oh, a fun aside: here's how big one of those 3 regexes comes out to: over 33 thousand characters long:
|
One or a few both work for me, but right now there are so many it's harder to follow.
I think we're saying the same thing, but just to be certain, it doesn't take away functionality, but rather moves named backreferences into the |
An idea, only if you're into it: This syntax change could be put behind a new installable option (maybe called |
e34bf20
to
1e000ef
Compare
Done, let me know if there's anything else I can do to make it easier to review
Oh, right, I had totally forgotten how that works. Been staring at the screen too long today. I'd be down to enable namespacing by default in the next major release, especially since it can be turned back off.
Hmm, do we know of any other breaking changes on the horizon? If not, I think I'd rather just push forward. Though, I wonder if we can do a search over xregexp's large dependents, to see how widespread the breakage would be. |
@@ -7,6 +7,7 @@ module.exports = { | |||
}, | |||
"extends": "eslint:recommended", | |||
"parserOptions": { | |||
"ecmaVersion": 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows the eslint parser to support the new syntax in src/xregexp.js, rather than showing an error like this:
/Users/josephfrazier/workspace/xregexp/src/xregexp.js
1847:6 error Parsing error: Invalid regular expression: /\(\?P?<([\p{ID_Start}$_][\p{ID_Continue}$_\u200C\u200D]*)>/: Invalid escape
Here are a few potential additional things I can think of that would be breaking changes:
But that's just a brain dump of ideas because you asked. I agree with you about not holding up this PR to get other breaking changes in, and in fact most of the above could be argued to not actually be improvements. Namespacing by default might be a different story though, since it seems useful/meaningful to make XRegExp more compatible with ES2018. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the thorough review and suggested changes! I've made them, and think it's ready for another look, whenever you find time.
I did notice there's some comments in the code that still need to be updated, will get around to that in a bit.
@@ -7,6 +7,7 @@ module.exports = { | |||
}, | |||
"extends": "eslint:recommended", | |||
"parserOptions": { | |||
"ecmaVersion": 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows the eslint parser to support the new syntax in src/xregexp.js, rather than showing an error like this:
/Users/josephfrazier/workspace/xregexp/src/xregexp.js
1847:6 error Parsing error: Invalid regular expression: /\(\?P?<([\p{ID_Start}$_][\p{ID_Continue}$_\u200C\u200D]*)>/: Invalid escape
Thanks for that brain dump of potential breaking changes, that's very informative. I might create separate issues for each of them, so they can be tracked/discussed them at a higher level. Of the changes listed, one stands out:
This one seems very in line with the changes in this PR, as well as the idea to enable |
I feel you in that bundling that with the release of this PR and making it part of a package of ES2018 named capture related changes would be the best way/timing to drop support for Pros of removal:
Cons of removal:
That said, if you want to move forward with this idea and think it's valuable, I'm happy to defer to you and would be happy to review the diffs. And if you're also worried about the backcompat issues, perhaps we could hide the old syntax behind an |
Super nice! Squashed and merged. |
(thanks for merging!) Ah, those are really great points re. why not to remove the
Now that I think about it, the abundance of official Babel plugins aimed at compiling regular expressions for older browsers shows that there's a lot of energy being devoted there. On the other hand, XRegExp allows similar transformations at runtime, rather than just during a build step, as well as providing other tools like When I consider the above, I think there's a good argument to be made that XRegExp should support new ES features, but also should not artificially limit its API/syntax for the sake of being 1-1 with the ES spec. As you explained, the As for how to move forward for the next release, I don't currently want to remove
Thanks again for working together on this change, it feels nice to get it done after 2 years! |
Following up on the [change] to use ES2018 constraints on the names of capture groups, this makes it so that XRegExp `namespacing` feature is enabled by default, so that named capture group matches will appear on the `.groups` property of the match object, rather than directly on the match object, in accordance with the ES2018 spec. While this is a breaking change, users can restore the old behavior by running: XRegExp.uninstall('namespacing') [change]: #247 (comment)
A couple of them have existing issues:
Also, there are currently 5 additional potential breaking changes as existing issues that I didn't mention above, that are tagged with |
Following up on the [change] to use ES2018 constraints on the names of capture groups, this makes it so that XRegExp `namespacing` feature is enabled by default, so that named capture group matches will appear on the `.groups` property of the match object, rather than directly on the match object, in accordance with the ES2018 spec. While this is a breaking change, users can restore the old behavior by running: XRegExp.uninstall('namespacing') [change]: #247 (comment)
Changes include: * BREAKING: Handle ES2018 capture names: #247 * BREAKING: Enable `namespacing` feature by default: #316 * BREAKING: Remove Unicode Blocks addon: 4860122 * restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261 * XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7 * Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476 * Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead * XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90 * XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b To generate this commit, I adapted the steps at #205 (comment) Here's a fuller list of changes that can be needed with new releases: > * Version number > * Update version number and year in headers, config files, README. > * Update version number in `XRegExp.version`. > * Publish > * Publish new git tag. E.g.: > * `git tag -a v3.1.0 -m "Release 3.1.0"`. > * `git push origin v3.1.0`. > * `npm publish`.
Changes include: * BREAKING: Handle ES2018 capture names: #247 * BREAKING: Enable `namespacing` feature by default: #316 * BREAKING: Remove Unicode Blocks addon: 4860122 * restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261 * XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7 * Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476 * Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead * XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90 * XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b To generate this commit, I adapted the steps at #205 (comment) Here's a fuller list of changes that can be needed with new releases: > * Version number > * Update version number and year in headers, config files, README. > * Update version number in `XRegExp.version`. > * Publish > * Publish new git tag. E.g.: > * `git tag -a v3.1.0 -m "Release 3.1.0"`. > * `git push origin v3.1.0`. > * `npm publish`.
See 1050835
and #241 (comment)
I think this should close #224
# TODO: