Add rules that "eslint-plugin-clean-regex" has #91

ota-meshi · 2021-04-02T01:32:23Z

Check each rule that eslint-plugin-clean-regex has, add the missing rule, and if eslint-plugin-regexp already has a similar rule, check for missing validation is needed.
Also, if we keep a note of the rule correspondence, it may be a document when creating a migration tool.

If we want to discuss one rule, we can also open a new issue.

Problems

Suggestions

The text was updated successfully, but these errors were encountered:

RunDevelopment · 2021-04-12T12:34:56Z

Note: I just found a bug in no-trivially-nested-lookaround (RunDevelopment/eslint-plugin-clean-regex#30). Whoever brings the rule over should fix the bug.

RunDevelopment · 2021-04-16T20:49:43Z

Regarding clean-regex/no-unnecessary-quantifier:
The rule is mostly covered by no-useless-exactly-quantifier but also does some other stuff that should probably be included in confusing-quantifier. I will need to think about that for a bit.

ota-meshi · 2021-04-17T08:39:21Z

I think we possible to mark no-useless-exactly-quantifier as deprecated and add new rule 😉

RunDevelopment · 2021-04-21T11:22:01Z

Regarding clean-regex/disjoint-alternatives:
Bringing this rule over will be tricky. It overlaps with no-dupe-disjunctions but it's not better than no-dupe-disjunctions. disjoint-alternatives is implemented using NFAs. This means that it can find non-trivial overlaps between alternatives that no-dupe-disjunctions can't find but it has 2 downside:

NFAs are expensive. While NFA operations are fairly optimized, they are not free.
Not all JS regexes can be converted into NFAs. I.e. disjoint-alternatives can't detect that ^a$ and ^a$ are the same. Assertions and backreferences can't be represented in NFAs so regexes containing them are difficult to deal with.

A hybrid approach using the current implementations of no-dupe-disjunctions and disjoint-alternatives has to be implemented.

Also want to talk about prefixes. no-dupe-disjunctions detects mistakes like a|abc (I do not understand why disallowNeverMatch is disabled by default). This is very useful but it's probably not complete (I say "probably" because the detection is so good that I couldn't find a simple example where it fails). Using NFAs, we can further improve the detection but it will be computationally expensive.

If it's ok with you @ota-meshi, then I'll bring over disjoint-alternatives some time this week

ota-meshi · 2021-04-21T22:25:14Z

A hybrid approach using the current implementations of no-dupe-disjunctions and disjoint-alternatives has to be implemented.

I think this idea is good!

(I do not understand why disallowNeverMatch is disabled by default)

Since disallowNeverMatch verifies patterns one by one, some patterns take a lot of time, so I turned off the default. The default behavior is that some patterns can be compared as strings, so I think it's fast in most cases.

This is very useful but it's probably not complete

I think so, too. As far as I know, I haven't verified the \p{} and i flag (This can be improved using regexp-ast-analysis).
Other than that, it should work as far as I can think of, but I'm still not as familiar with regexp as you 😅.

RunDevelopment · 2021-04-22T10:02:34Z

[...] some patterns take a lot of time, so I turned off the default. The default behavior is that some patterns can be compared as strings, so I think it's fast in most cases.

I ran the rule on Prism's 2k regexes this is the result:

config	time
`'regexp/no-dupe-disjunctions': 'off'`	2.9s
`'regexp/no-dupe-disjunctions': 'warn'`	3.7s
`'regexp/no-dupe-disjunctions': ['warn', { disallowNeverMatch: true }]`	5.4s

And then there's disjoint-alternatives:

config	time
`'clean-regex/disjoint-alternatives': 'off'`	2.9s
`'clean-regex/disjoint-alternatives': 'warn'`	5.6s

Since disjoint-alternatives doesn't even implement a disallowNeverMatch: true option, the final hybrid approach will be even slower.

That being said, I don't think that this is a huge problem. Prism has literally thousands of complex regexes, the average code base will not see a noticeable performance impact. ESLint also provides options to cache results and to only run on changed files.

We should also consider that no-dupe-disjunctions and disjoint-alternatives find real bugs that are hard to notice. These rules provide real value.

I agree that users must have the option to turn off advanced analysis if performance was a problem. But we shouldn't hide some of our best features by default.

Other than that, it should work as far as I can think of [...].

Implement disallowNeverMatch for two alternatives A|B, we need to find out whether B is a subset of A[^]* (= A followed by any string).

Don't get me wrong, the isCovered function is really good but there will be some complex cases that it simply can't detect without doing a proper subset check.
(That being said, the proper subset check is only possible in O(2^n). So we also won't be able to implement the subset check perfectly.)

RunDevelopment · 2021-04-29T14:26:13Z

Regarding clean-regex/no-constant-capturing-group:
With #188, we support the default configuration of clean-regex/no-constant-capturing-group. I think we don't need to bring over the rest.

There are legitimate use cases for capturing groups that only match a single string so I don't want to just disallow them.

ota-meshi · 2021-06-01T01:27:04Z

@RunDevelopment
I'm glad that most of the rule migrations are done 😆

The clean-regex/identity-escape rule is similar to the regexp/no-useless-escape rule, but the clean-regex/identity-escape rule seems to give user the flexibility to change the checks.

Do you think we can merge the features of the clean-regex/identity-escape rule by adding an option to the regexp/no-useless-escape rule? Or do you think we need to add new rule?

RunDevelopment · 2021-06-01T15:31:32Z

I'm not sure about clean-regex/identity-escape.

The problem is that the rule can also add useless escaping (i.e. you can configure it to escape all ! characters) which conflicts with ESLint's no-useless-escape rule and our regexp/no-useless-escape rule.

The reason I added clean-regex/identity-escape is that I needed something like regexp/no-useless-escape but I also wanted something that enforces the use of \{, \}, and \] outside of character classes. We don't need an extra rule just for these 3 characters and merging this into regexp/no-useless-escape is weird.

Maybe we could use the fact that Unicode parsing mode basically has the rules for escaping that I want? We could add a new rule regexp/strict that makes sure any regex without the u flag is also syntactically valid with the u flag. This rule would have to add/remove escaping in exactly the way that I want in order to achieve its goal. regexp/strict will slightly overlap with regexp/no-useless-escape but not completely (e.g. regexp/strict will allow /[\.]/).

With regexp/strict, we could make my taste for escaping into something that is actually useful. This also gets rid of conflicts with the no-useless-escape rules.

(It even gets rid of the overly complex rule option of clean-regex/identity-escape 😅 )

ota-meshi · 2021-06-02T05:07:55Z

The regexp/strict rule sounds good to me!

Maybe we can use RegExpValidator's strict option.
https://github.com/mysticatea/regexpp/blob/master/src/validator.ts#L127

RunDevelopment · 2021-06-02T09:06:30Z

Maybe we can use RegExpValidator's strict option.

That's where I got the idea for the name from :)

RunDevelopment · 2021-06-04T10:28:59Z

With regexp/strict now being merged, I think we are done.

Thank you very much for your help @ota-meshi!

ota-meshi mentioned this issue Apr 9, 2021

Add regexp/confusing-quantifier rule #93

Merged

RunDevelopment mentioned this issue Apr 14, 2021

Added regexp/no-trivially-nested-assertion rule #120

Merged

RunDevelopment added this to the v1.0 milestone Apr 15, 2021

RunDevelopment mentioned this issue May 5, 2021

Add no-useless-quantifier rule #197

Merged

ota-meshi mentioned this issue Jun 2, 2021

Add regexp/strict rule #220

Merged

RunDevelopment closed this as completed Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rules that "eslint-plugin-clean-regex" has #91

Add rules that "eslint-plugin-clean-regex" has #91

ota-meshi commented Apr 2, 2021 •

edited by RunDevelopment

RunDevelopment commented Apr 12, 2021

RunDevelopment commented Apr 16, 2021

ota-meshi commented Apr 17, 2021

RunDevelopment commented Apr 21, 2021 •

edited

ota-meshi commented Apr 21, 2021

RunDevelopment commented Apr 22, 2021

RunDevelopment commented Apr 29, 2021

ota-meshi commented Jun 1, 2021

RunDevelopment commented Jun 1, 2021

ota-meshi commented Jun 2, 2021

RunDevelopment commented Jun 2, 2021

RunDevelopment commented Jun 4, 2021

Add rules that "eslint-plugin-clean-regex" has #91

Add rules that "eslint-plugin-clean-regex" has #91

Comments

ota-meshi commented Apr 2, 2021 • edited by RunDevelopment

Problems

Suggestions

RunDevelopment commented Apr 12, 2021

RunDevelopment commented Apr 16, 2021

ota-meshi commented Apr 17, 2021

RunDevelopment commented Apr 21, 2021 • edited

ota-meshi commented Apr 21, 2021

RunDevelopment commented Apr 22, 2021

RunDevelopment commented Apr 29, 2021

ota-meshi commented Jun 1, 2021

RunDevelopment commented Jun 1, 2021

ota-meshi commented Jun 2, 2021

RunDevelopment commented Jun 2, 2021

RunDevelopment commented Jun 4, 2021

ota-meshi commented Apr 2, 2021 •

edited by RunDevelopment

RunDevelopment commented Apr 21, 2021 •

edited