Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rules that "eslint-plugin-clean-regex" has #91

Closed
31 tasks done
ota-meshi opened this issue Apr 2, 2021 · 12 comments
Closed
31 tasks done

Add rules that "eslint-plugin-clean-regex" has #91

ota-meshi opened this issue Apr 2, 2021 · 12 comments
Milestone

Comments

@ota-meshi
Copy link
Owner

ota-meshi commented Apr 2, 2021

Check each rule that eslint-plugin-clean-regex has, add the missing rule, and if eslint-plugin-regexp already has a similar rule, check for missing validation is needed.
Also, if we keep a note of the rule correspondence, it may be a document when creating a migration tool.

If we want to discuss one rule, we can also open a new issue.

Problems

Suggestions

@RunDevelopment
Copy link
Collaborator

Note: I just found a bug in no-trivially-nested-lookaround (RunDevelopment/eslint-plugin-clean-regex#30). Whoever brings the rule over should fix the bug.

@RunDevelopment
Copy link
Collaborator

Regarding clean-regex/no-unnecessary-quantifier:
The rule is mostly covered by no-useless-exactly-quantifier but also does some other stuff that should probably be included in confusing-quantifier. I will need to think about that for a bit.

@ota-meshi
Copy link
Owner Author

I think we possible to mark no-useless-exactly-quantifier as deprecated and add new rule 😉

@RunDevelopment
Copy link
Collaborator

RunDevelopment commented Apr 21, 2021

Regarding clean-regex/disjoint-alternatives:
Bringing this rule over will be tricky. It overlaps with no-dupe-disjunctions but it's not better than no-dupe-disjunctions. disjoint-alternatives is implemented using NFAs. This means that it can find non-trivial overlaps between alternatives that no-dupe-disjunctions can't find but it has 2 downside:

  1. NFAs are expensive. While NFA operations are fairly optimized, they are not free.
  2. Not all JS regexes can be converted into NFAs. I.e. disjoint-alternatives can't detect that ^a$ and ^a$ are the same. Assertions and backreferences can't be represented in NFAs so regexes containing them are difficult to deal with.

A hybrid approach using the current implementations of no-dupe-disjunctions and disjoint-alternatives has to be implemented.

Also want to talk about prefixes. no-dupe-disjunctions detects mistakes like a|abc (I do not understand why disallowNeverMatch is disabled by default). This is very useful but it's probably not complete (I say "probably" because the detection is so good that I couldn't find a simple example where it fails). Using NFAs, we can further improve the detection but it will be computationally expensive.

If it's ok with you @ota-meshi, then I'll bring over disjoint-alternatives some time this week

@ota-meshi
Copy link
Owner Author

A hybrid approach using the current implementations of no-dupe-disjunctions and disjoint-alternatives has to be implemented.

I think this idea is good!

(I do not understand why disallowNeverMatch is disabled by default)

Since disallowNeverMatch verifies patterns one by one, some patterns take a lot of time, so I turned off the default. The default behavior is that some patterns can be compared as strings, so I think it's fast in most cases.

This is very useful but it's probably not complete

I think so, too. As far as I know, I haven't verified the \p{} and i flag (This can be improved using regexp-ast-analysis).
Other than that, it should work as far as I can think of, but I'm still not as familiar with regexp as you 😅.

@RunDevelopment
Copy link
Collaborator

[...] some patterns take a lot of time, so I turned off the default. The default behavior is that some patterns can be compared as strings, so I think it's fast in most cases.

I ran the rule on Prism's 2k regexes this is the result:

config time
'regexp/no-dupe-disjunctions': 'off' 2.9s
'regexp/no-dupe-disjunctions': 'warn' 3.7s
'regexp/no-dupe-disjunctions': ['warn', { disallowNeverMatch: true }] 5.4s

And then there's disjoint-alternatives:

config time
'clean-regex/disjoint-alternatives': 'off' 2.9s
'clean-regex/disjoint-alternatives': 'warn' 5.6s

Since disjoint-alternatives doesn't even implement a disallowNeverMatch: true option, the final hybrid approach will be even slower.

That being said, I don't think that this is a huge problem. Prism has literally thousands of complex regexes, the average code base will not see a noticeable performance impact. ESLint also provides options to cache results and to only run on changed files.

We should also consider that no-dupe-disjunctions and disjoint-alternatives find real bugs that are hard to notice. These rules provide real value.

I agree that users must have the option to turn off advanced analysis if performance was a problem. But we shouldn't hide some of our best features by default.

Other than that, it should work as far as I can think of [...].

Implement disallowNeverMatch for two alternatives A|B, we need to find out whether B is a subset of A[^]* (= A followed by any string).

Don't get me wrong, the isCovered function is really good but there will be some complex cases that it simply can't detect without doing a proper subset check.
(That being said, the proper subset check is only possible in O(2^n). So we also won't be able to implement the subset check perfectly.)

@RunDevelopment
Copy link
Collaborator

Regarding clean-regex/no-constant-capturing-group:
With #188, we support the default configuration of clean-regex/no-constant-capturing-group. I think we don't need to bring over the rest.

There are legitimate use cases for capturing groups that only match a single string so I don't want to just disallow them.

@ota-meshi
Copy link
Owner Author

@RunDevelopment
I'm glad that most of the rule migrations are done 😆

The clean-regex/identity-escape rule is similar to the regexp/no-useless-escape rule, but the clean-regex/identity-escape rule seems to give user the flexibility to change the checks.

Do you think we can merge the features of the clean-regex/identity-escape rule by adding an option to the regexp/no-useless-escape rule? Or do you think we need to add new rule?

@RunDevelopment
Copy link
Collaborator

I'm not sure about clean-regex/identity-escape.

The problem is that the rule can also add useless escaping (i.e. you can configure it to escape all ! characters) which conflicts with ESLint's no-useless-escape rule and our regexp/no-useless-escape rule.

The reason I added clean-regex/identity-escape is that I needed something like regexp/no-useless-escape but I also wanted something that enforces the use of \{, \}, and \] outside of character classes. We don't need an extra rule just for these 3 characters and merging this into regexp/no-useless-escape is weird.

Maybe we could use the fact that Unicode parsing mode basically has the rules for escaping that I want? We could add a new rule regexp/strict that makes sure any regex without the u flag is also syntactically valid with the u flag. This rule would have to add/remove escaping in exactly the way that I want in order to achieve its goal. regexp/strict will slightly overlap with regexp/no-useless-escape but not completely (e.g. regexp/strict will allow /[\.]/).

With regexp/strict, we could make my taste for escaping into something that is actually useful. This also gets rid of conflicts with the no-useless-escape rules.

(It even gets rid of the overly complex rule option of clean-regex/identity-escape 😅 )

@ota-meshi
Copy link
Owner Author

The regexp/strict rule sounds good to me!

Maybe we can use RegExpValidator's strict option.
https://github.com/mysticatea/regexpp/blob/master/src/validator.ts#L127

@RunDevelopment
Copy link
Collaborator

Maybe we can use RegExpValidator's strict option.

That's where I got the idea for the name from :)

@RunDevelopment
Copy link
Collaborator

With regexp/strict now being merged, I think we are done.

Thank you very much for your help @ota-meshi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants