Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align with native Unicode property escapes #225

Open
mathiasbynens opened this issue Feb 21, 2018 · 2 comments
Open

Align with native Unicode property escapes #225

mathiasbynens opened this issue Feb 21, 2018 · 2 comments

Comments

@mathiasbynens
Copy link
Collaborator

Other than the set of supported properties, there are some key differences between XRegExp’s handling of \p{…} and the way they work in native JS.

  • Native \p{…} doesn’t implement loose matching; only strict, case-sensitive matches for canonical property names and values (or their aliases) are accepted
  • In native \p{…}, Blocks are not supported
  • Native \p{…} doesn’t support for the In prefix or any other prefix (although since XRegExp only does this for Blocks, dropping Block support already resolves this)
  • Native \p{…} supports Script_Extensions which is generally more useful than Script

Technically these are all breaking changes, but IMHO we should consider aligning with native property escapes.

@slevithan
Copy link
Owner

slevithan commented Feb 21, 2018

Supporting Script_Extensions is the biggest change here. It might make sense to merely ensure that use of Script_Extensions (which is easy to identify since ES2018 requires a prefix when using them) is passed through correctly and works when in an ES2018 environment that supports it natively. That would avoid the need for adding the large associated data that is significantly redundant with the existing Script data. But then, removing Blocks would "free up" space for this. I'm definitely open to adding Script_Extensions as a new addon.

The other changes (dropping support for loose name matching and Blocks) should be straightforward. Happy to adopt these changes and publish a new major version. Will go ahead with them if others don't get to them before me.

Aside: While we're considering breaking changes for the Unicode addons, perhaps we should move unicode-base.js into xregexp.js, to make it even easier to import the individual sets of data for scripts, general categories, and binary properties based on what you need.

@slevithan
Copy link
Owner

Removed support for Unicode blocks in commit 4860122.

josephfrazier added a commit that referenced this issue Feb 7, 2021
Changes include:

* BREAKING: Handle ES2018 capture names: #247
* BREAKING: Enable `namespacing` feature by default: #316
* BREAKING: Remove Unicode Blocks addon: 4860122
* restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261
* XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7
* Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476
* Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead
* XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90
* XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b

To generate this commit, I adapted the steps at #205 (comment)

Here's a fuller list of changes that can be needed with new releases:

> * Version number
>   * Update version number and year in headers, config files, README.
>   * Update version number in `XRegExp.version`.
> * Publish
>   * Publish new git tag. E.g.:
>     * `git tag -a v3.1.0 -m "Release 3.1.0"`.
>     * `git push origin v3.1.0`.
>   * `npm publish`.
josephfrazier added a commit that referenced this issue Feb 8, 2021
Changes include:

* BREAKING: Handle ES2018 capture names: #247
* BREAKING: Enable `namespacing` feature by default: #316
* BREAKING: Remove Unicode Blocks addon: 4860122
* restore perf tweak that made a meaningful difference in regex construction perf tests: 5f18261
* XRegExp.exec: preserve groups obj that comes from native ES2018 named capture: c4a83e7
* Make XRegExp.exec set groups prop to undefined if there are no named captures (closes #320): 7fea476
* Support optional 'Script=' prefix (from ES2018 syntax) for Unicode script tokens (#225): bb35ead
* XRegExp.matchRecursive: Add delimiter and pos info when unbalanced delimiters are found (closes #293): 9660b90
* XRegExp.escape: Escape whitespace in a way that works with ES6 flag u (fixes #197): e22a52b

To generate this commit, I adapted the steps at #205 (comment)

Here's a fuller list of changes that can be needed with new releases:

> * Version number
>   * Update version number and year in headers, config files, README.
>   * Update version number in `XRegExp.version`.
> * Publish
>   * Publish new git tag. E.g.:
>     * `git tag -a v3.1.0 -m "Release 3.1.0"`.
>     * `git push origin v3.1.0`.
>   * `npm publish`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants