Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rule proposal: no-useless-backreference #12673

Closed
mdjermanovic opened this issue Dec 16, 2019 · 3 comments
Closed

New rule proposal: no-useless-backreference #12673

mdjermanovic opened this issue Dec 16, 2019 · 3 comments
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules

Comments

@mdjermanovic
Copy link
Member

mdjermanovic commented Dec 16, 2019

Please describe what the rule should do:

Disallow useless backreferences in regular expressions - syntactically valid backreferences that always match zero-length and at the same time can match only zero-length, meaning that they are effectively ignored and present an almost certain error in a regular expression. In particular:

  • Reference to a group that appears later in the regular expression. JS doesn't support forward references, but it also doesn't disallow backreferences that appear as forward references - they are treated as valid, but silently ignored.
  • Reference to a group from within the group (nested reference). Same as above.
  • Reference to a group that is in a negative lookaround from outside of that negative lookaround (spec 21.2.2.8.2 - Note 3).

Applies to both numbered and named backreferences.

What category of rule is this? (place an "X" next to just one item)

[X] Warns about a potential error (problem)

Provide 2-3 code examples that this rule will warn about:

// same as /^(a)$/
/^\1(a)$/.test("a") // true
/^\1(a)$/.test("aa") // false

// same as /^(a)$/
/^(a\1)$/.test("a") // true
/^(a\1)$/.test("aa") // false

/^(?:a|b)(c|d)e\2(f|g)$/

/a(?!(b|c){2}).([a-z])d\1/
/(?<!(b|c))a(d|e)f\1/

Why should this rule be included in ESLint (instead of a plugin)?

It prevents very possible errors in the code:

  • User may be unaware that JS doesn't support forward and nested references, because many languages do support them. Even worse, there are no parsing or runtime errors from the engine.
  • Unintended reference - reference to a wrong group as a consequence of another error (e.g., shifted index, wrong parentheses etc.).

Note: this rule targets only backreferences that are parsed as such. It does not aim to disallow "look-like backreferences":

  • Numbered reference to a group that doesn't exist. \1...\7 should be already a parsing error, or (by Anex B) an octal escape. This should be probably an enhancement for the no-octal-escape rule, which currently targets only strings. \8 or \9 should be a parsing error but it's usually treated as just useless identity escape.
  • \1...\9 in character classes. Similar to above, this is either a parsing error, octal escape or useless identity escape.
  • Named reference to a group that doesn't exist. This is either a parsing error (when there is at least one named group or the u flag) or just a sequence of characters (for backwards compatibility).

Are you willing to submit a pull request to implement this rule?

Yes.

@mdjermanovic mdjermanovic added rule Relates to ESLint's core rules feature This change adds a new feature to ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Dec 16, 2019
@mdjermanovic mdjermanovic self-assigned this Dec 16, 2019
@mdjermanovic
Copy link
Member Author

There is another case that also seems easy to check with regexpp and can be included in this rule from the start - when a backreference and its group have Alternative ancestors that are siblings (which would be children of their lowest common ancestor).

That would mean that the backreference should match what's captured in a non-participating group, which JS treats as zero-length instead of failure and again effectively ignores the backreference.

// same as /^(?:(a)|b)$/
/^(?:(a)|\1b)$/.test("a"); // true
/^(?:(a)|\1b)$/.test("b"); // true!
/^(?:(a)|\1b)$/.test("ab"); // false
/^(?:(a)|\1b)$/.test("c"); // false

@mysticatea mysticatea added accepted There is consensus among the team that this change meets the criteria for inclusion and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Dec 17, 2019
@mdjermanovic
Copy link
Member Author

I'm working on this.

@iugo
Copy link

iugo commented Mar 10, 2020

@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Jul 9, 2020
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Jul 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules
Projects
None yet
Development

No branches or pull requests

3 participants