Skip to content
This repository has been archived by the owner on Feb 16, 2024. It is now read-only.

What do [^(ab)] and [^(a)] mean? #45

Closed
nicolo-ribaudo opened this issue Sep 8, 2021 · 5 comments
Closed

What do [^(ab)] and [^(a)] mean? #45

nicolo-ribaudo opened this issue Sep 8, 2021 · 5 comments

Comments

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Sep 8, 2021

I'm working on transpiler support for this proposal in regexpu-core/Babel, and I have a question about negated classes with strings. I already read #7 and I feel like it should answer my question, but I don't understand exactly how it should work.

  • What should [^(ab)] be compiled to? Is it something like (?!ab)[^]? Or just [^]?
  • What about [^(a)]? Is it the same as [^a], or it's just [^]?
@RunDevelopment
Copy link

  • What should [^(ab)] be compiled to?

This will be a syntax error. Solution 1.5.b in #7 essentially says: "A character class can only be negated if it is guaranteed to not contain strings."

  • What about [^(a)]?

This is an interesting question that I don't think has been answered yet. Basically, are we allowed to assume that [a] == [(a)]?

If we are allowed to assume that every single-character string is treated as a single character, then [^(a)] will be equal to [^a].

@nicolo-ribaudo
Copy link
Member Author

Thanks for the clarification 👍

Personally, I would be more surprised if [^(a)] behaved differently from [^(ab)] than if it behaved differently from [^a]. However, throwing for [^(a)] might also pose the question of what happens for [^()]: should it also throw, since it uses the syntax for nested strings (even if it doesn't actually introduce any string)?

@RunDevelopment
Copy link

RunDevelopment commented Sep 8, 2021

might also pose the question of what happens for [^()]: should it also throw, since it uses the syntax for nested strings (even if it doesn't actually introduce any string)?

[()] does add the empty string (e.g. [(ab)()] == (?:ab|)). Since the empty string isn't a single-character string, I would expect it to throw.


Also, I just read through the current spec draft. The relevant section is "22.2.1.7 Static Semantics: MaybeStrings". The NonEmptyClassString case states that single-character class strings are not MaybeStrings. So [^(a)] should be allowed and is the same as [^a], if I read the spec correctly.

@markusicu
Copy link
Collaborator

Thanks Michael, spot on :-)

Yes,

  • complement of a set with an empty string and/or with multi-character strings is a SyntaxError
  • single code points in string literals devolve into, well, single code points
  • [()] is a set with the empty string

Consider that often times people may not be sure whether some "thing" is encoded using a single code point or multiple. If they are not sure, then they can use the string literal syntax.

[(👩🏿‍✈️|🚲|🇧🇪)]+ will work as expected, while [👩🏿‍✈️🚲🇧🇪]+ matches lots of unexpected strings, including "🇧🇧" and "🇪🇪". Only one of these emoji is a single code point.

@nicolo-ribaudo
Copy link
Member Author

nicolo-ribaudo commented Sep 9, 2021

Thanks everyone! I'm closing this since my questions have been answered.

If anyone is interested in watching the progress in the transpiler implementation, you can check mathiasbynens/regexpu-core#51.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants