Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address feedback from plenary #77

Merged
merged 5 commits into from
May 13, 2024
Merged

Address feedback from plenary #77

merged 5 commits into from
May 13, 2024

Conversation

bakkot
Copy link
Collaborator

@bakkot bakkot commented Apr 9, 2024

Commits should be reviewed individually. Summary:

  • use \$ etc where possibly; this is not possible for all punctuators, but it is for some
  • also escape line terminators; this was just an oversight on my part
  • use \n etc where possible
  • if given a lone surrogate, escape it, so that if there is someday a non-BMP whitespace character, and we get x-mode RegExps, and someone writes new RegExp(RegExp.escape('\uHEAD') + RegExp.escape('\uTAIL'), 'x') where \uHEAD\uTAIL encodes that non-BMP whitespace character, the resulting RegExp will match the character instead of being interpreted as whitespace and therefore (in x-mode) ignored
  • hex-escape ASCII letters at the start of the string so that new RegExp('\c' + RegExp.escape('Z')) will either be an error (in u- or v-mode RegExps) or match the string '\\cZ' (in other RegExps); this also has the effect that the result cannot combine with a preceding \x or \u when the first character is A-F or a-f. Fixes Which leading characters should be escaped? #66.

@bakkot
Copy link
Collaborator Author

bakkot commented Apr 9, 2024

cc @erights @waldemarhorwat @gibson042 Please take a look.

@bakkot bakkot force-pushed the address-feedback branch 2 times, most recently from 7843108 to a839978 Compare April 9, 2024 05:13
spec.emu Outdated
@@ -62,7 +62,7 @@ contributors:
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the "ControlEscape" column of the row whose "Code Point" column contains _c_.
1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace| or |LineTerminator|, then
1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is in the inclusive interval from U+D800 to U+DFFF (i.e. _c_ is a leading surrogate or trailing surrogate), then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

          1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is a leading surrogate or a trailing surrogate, then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, leading and trailing surrogates are defined to be code units, not code points. I figure it's close enough for the parenthetical, but not the actual algorithm step. I could change the parenthetical to be "c is the code point corresponding to a leading surrogate or trailing surrogate" but that's getting pretty wordy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the oxford comma, this suggestion is a nit, so i'm ambivalent

spec.emu Show resolved Hide resolved
@erights
Copy link

erights commented Apr 9, 2024

How do I see a rendered form of this?

@bakkot
Copy link
Collaborator Author

bakkot commented Apr 9, 2024

I've put up a rendering here.

@erights
Copy link

erights commented Apr 9, 2024

Thanks. FWIW LGTM, but I delegate an approval decision to @gibson042 who understands this better than I do.

@erights
Copy link

erights commented Apr 9, 2024

Thanks for the rendering. Cut down the review effort for me by a huge factor.

Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on LineTerminator! I still prefer the universally applicable \x…/\u… approach, but this does work (even if it could get stale).

spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
Comment on lines +61 to +62
1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a point-in-time snapshot, and I'm not a fan because it will be weird if e.g. \@ becomes a valid escape in the future. My preference remains the universally applicable \x…/\u… approach. But that said, there is no technical issue here; merely an aesthetic one.

1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence.
1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_.
1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this sentence is too long to not use a parenthetical.

Suggested change
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a |DecimalEscape| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those commas read as ungrammatical to me. In particular the second comma cuts a clause in the middle - the pattern text "maybe used after \0 [...] and still match S". Open to other rephrasing here but I don't like this particular suggestion.

1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence.
1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_.
1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, TIL \c0 is a valid escape. This also prevents new RegExp("\\" + escape('n')) from combining.

spec.emu Show resolved Hide resolved
Vylpes pushed a commit to Vylpes/Droplet that referenced this pull request May 28, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [core-js](https://github.com/zloirock/core-js) | dependencies | minor | [`3.36.1` -> `3.37.1`](https://renovatebot.com/diffs/npm/core-js/3.36.1/3.37.1) |

---

### Release Notes

<details>
<summary>zloirock/core-js (core-js)</summary>

### [`v3.37.1`](https://github.com/zloirock/core-js/blob/HEAD/CHANGELOG.md#3371---20240514)

[Compare Source](zloirock/core-js@v3.37.0...v3.37.1)

-   Changes [v3.37.0...v3.37.1](zloirock/core-js@v3.37.0...v3.37.1)
-   Fixed [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) feature detection for some specific cases
-   Compat data improvements:
    -   [`Set` methods proposal](https://github.com/tc39/proposal-set-methods) added and marked as [supported from FF 127](https://bugzilla.mozilla.org/show_bug.cgi?id=1868423)
    -   [`Symbol.dispose`](https://github.com/tc39/proposal-explicit-resource-management) added and marked as supported from V8 ~ Chromium 125
    -   [`Math.f16round` and `DataView.prototype.{ getFloat16, setFloat16 }`](https://github.com/tc39/proposal-float16array) added and marked as [supported from Deno 1.43](denoland/deno#23490)
    -   [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) added and marked as [supported from Chromium 126](https://chromestatus.com/feature/6301071388704768)
    -   [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) added and marked as [supported from NodeJS 22.0](nodejs/node#52280)
    -   [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) added and marked as [supported from Deno 1.43](denoland/deno#23318)
    -   Added [Rhino 1.7.15](https://github.com/mozilla/rhino/releases/tag/Rhino1\_7\_15\_Release) compat data, many features marked as supported
    -   Added [NodeJS 22.0](https://nodejs.org/en/blog/release/v22.0.0) compat data mapping
    -   Added [Deno 1.43](https://github.com/denoland/deno/releases/tag/v1.43.0) compat data mapping
    -   Added Electron 31 compat data mapping
    -   Updated [Opera Android 82](https://forums.opera.com/topic/71513/opera-for-android-82) compat data mapping
    -   Added Samsung Internet 26 compat data mapping
    -   Added Oculus Quest Browser 33 compat data mapping

### [`v3.37.0`](https://github.com/zloirock/core-js/blob/HEAD/CHANGELOG.md#3370---20240417)

[Compare Source](zloirock/core-js@v3.36.1...v3.37.0)

-   Changes [v3.36.1...v3.37.0](zloirock/core-js@v3.36.1...v3.37.0)
-   [New `Set` methods proposal](https://github.com/tc39/proposal-set-methods):
    -   Built-ins:
        -   `Set.prototype.intersection`
        -   `Set.prototype.union`
        -   `Set.prototype.difference`
        -   `Set.prototype.symmetricDifference`
        -   `Set.prototype.isSubsetOf`
        -   `Set.prototype.isSupersetOf`
        -   `Set.prototype.isDisjointFrom`
    -   Moved to stable ES, [April 2024 TC39 meeting](tc39/proposals@bda5a6b)
    -   Added `es.` namespace modules, `/es/` and `/stable/` namespaces entries
-   [Explicit Resource Management stage 3 proposal](https://github.com/tc39/proposal-explicit-resource-management):
    -   Some minor updates like [explicit-resource-management/217](tc39/proposal-explicit-resource-management#217)
-   Added [`Math.sumPrecise` stage 2.7 proposal](https://github.com/tc39/proposal-math-sum/):
    -   Built-ins:
        -   `Math.sumPrecise`
-   [`Promise.try` proposal](https://github.com/tc39/proposal-promise-try):
    -   Built-ins:
        -   `Promise.try`
    -   Added optional arguments support, [promise-try/16](tc39/proposal-promise-try#16)
    -   Moved to stage 2.7, [April 2024 TC39 meeting](tc39/proposals@301fc9c)
-   [`RegExp.escape` stage 2 proposal](https://github.com/tc39/proposal-regex-escaping):
    -   Moved to hex-escape semantics, [regex-escaping/67](tc39/proposal-regex-escaping#67)
        -   It's not the final change of the way of escaping, waiting for [regex-escaping/77](tc39/proposal-regex-escaping#77) soon
-   [Pattern matching stage 1 proposal](https://github.com/tc39/proposal-pattern-matching):
    -   Built-ins:
        -   `Symbol.customMatcher`
    -   Once again, [the used well-known symbol was renamed](tc39/proposal-pattern-matching#295)
    -   Added new entries for that
-   Added [Extractors stage 1 proposal](https://github.com/tc39/proposal-extractors):
    -   Built-ins:
        -   `Symbol.customMatcher`
    -   Since the `Symbol.customMatcher` well-known symbol from the pattern matching proposal is also used in the exactors proposal, added an entry also for this proposal
-   Added [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse), [url/825](whatwg/url#825)
-   Engines bugs fixes:
    -   Added a fix of [Safari `{ Object, Map }.groupBy` bug that does not support iterable primitives](https://bugs.webkit.org/show_bug.cgi?id=271524)
    -   Added a fix of [Safari bug with double call of constructor in `Array.fromAsync`](https://bugs.webkit.org/show_bug.cgi?id=271703)
-   Compat data improvements:
    -   [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) added and marked as supported [from FF 126](https://bugzilla.mozilla.org/show_bug.cgi?id=1887611)
    -   [`URL.parse`](https://url.spec.whatwg.org/#dom-url-parse) added and marked as supported [from Bun 1.1.4](oven-sh/bun#10129)
    -   [`URL.canParse`](https://url.spec.whatwg.org/#dom-url-canparse) fixed and marked as supported [from Bun 1.1.0](oven-sh/bun#9710)
    -   [New `Set` methods](https://github.com/tc39/proposal-set-methods) fixed in JavaScriptCore and marked as supported from Bun 1.1.1
    -   Added Opera Android 82 compat data mapping

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNy4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Reviewed-on: https://git.vylpes.xyz/RabbitLabs/Droplet/pulls/306
Co-authored-by: Renovate Bot <renovate@vylpes.com>
Co-committed-by: Renovate Bot <renovate@vylpes.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Which leading characters should be escaped?
5 participants