Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Fixer for missing unicode flag in no-misleading-character-class #15279

Closed
wants to merge 7 commits into from

Conversation

mathiasvr
Copy link
Contributor

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[x] Add autofixing to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)

Added autofixing for missing unicode "u" flag in case regex contains surrogate pair characters.

Is there anything you'd like reviewers to focus on?

@eslint-github-bot eslint-github-bot bot added the triage An ESLint team member will look at this issue soon label Nov 9, 2021
@eslint-github-bot
Copy link

Hi @mathiasvr!, thanks for the Pull Request

The pull request title isn't properly formatted. We ask that you update the message to match this format, as we use it to generate changelogs and automate releases.

  • The length of the commit message must be less than or equal to 72

Read more about contributing to ESLint here

@mathiasvr mathiasvr changed the title feat: Add fixer for missing unicode flag in no-misleading-character-class rule feat: Fixer for missing unicode flag in no-misleading-character-class Nov 12, 2021
@nzakas
Copy link
Member

nzakas commented Nov 20, 2021

Thanks for the PR. I’m not sure this fix is safe. I’m general, we don’t do fixes that change how code works because it can be difficult for end users to identify if it creates a problem. In this case, I fear that the change may fall into that category.

@eslint/eslint-tsc what do others think?

@btmills
Copy link
Member

btmills commented Nov 20, 2021

I agree. People often run autofix on save or as a commit hook, so we want to be extra careful about fixers. This seems like a good fit for a suggested fix.

@mathiasvr
Copy link
Contributor Author

Auto fixing code as a commit hook sounds like a terrible idea since you generally want your commits to be in the state you committed them.

@nzakas
Copy link
Member

nzakas commented Nov 25, 2021

Good point @btmills. @mathiasvr can you change this to a suggestion?

@mathiasvr
Copy link
Contributor Author

mathiasvr commented Nov 25, 2021

@nzakas Do I just change the type to suggestion?

Edit: Ah, I think I understand, change the fix to suggest.

@nzakas
Copy link
Member

nzakas commented Nov 25, 2021

Exactly, this: https://eslint.org/docs/developer-guide/working-with-rules#providing-suggestions

@mathiasvr
Copy link
Contributor Author

Okay I changed it to a suggestion. Just wondering, why would you enable this rule and not want an auto fix but only a suggestion?

Copy link
Member

@btmills btmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good! Can you add tests that verify the suggestion output is correct? For example:

{
    code: "var r = /[👍]/",
    errors: [{
        messageId: "surrogatePairWithoutUFlag",
        suggestions: [{ messageId: "suggestUnicodeFlag", output: "var r = /[👍]/u" }]
    }]
},

why would you enable this rule and not want an auto fix but only a suggestion?

For cases like this that would alter behavior, we want to bring them to the developer's attention as a suggestion instead of invisibly making the change in autofix, even if we're pretty sure the change is the one they want.

lib/rules/no-misleading-character-class.js Outdated Show resolved Hide resolved
Copy link
Member

@btmills btmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I like how you also added tests where there was already a non-u flag to make sure adding the Unicode flag didn’t clobber the existing flag(s).

Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@mdjermanovic mdjermanovic added evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules and removed triage An ESLint team member will look at this issue soon labels Dec 2, 2021
Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two concerns about this change.

First, adding the u flag can produce syntactically invalid regular expressions. For example, this is valid without the flag:

var r = /[👍]\a/;

When user applies the suggestion, it will cause a parsing error because \a isn't allowed with the u flag:

var r = /[👍]\a/u; // Parsing error: Invalid regular expression: /[👍]\a/: Invalid escape

Parsing errors will be noticeable right away, but in the case of RegExp constructors, users might become aware of the errors caused by suggestions from this rule only when they run the code.

var r = new RegExp("[👍]\\a", "u"); // no parsing errors, but this will throw in runtime

Second, suggestions are allowed to change the behavior of the code, but that should generally be limited to the reported problem. Consider the following example:

// K is U+212A

var regex = /^(?:[👍]|\W!)$/i;

console.log(
    regex.test("👍") // false
);

console.log(
    regex.test("K!") // true
);

Adding the u flag fixes the problem with [👍], but since the flag applies to the whole regular expression, it also changes the behavior of \W, which isn't related to the reported problem:

// K is U+212A

var regex = /^(?:[👍]|\W!)$/iu;

console.log(
    regex.test("👍") // true
);

console.log(
    regex.test("K!") // false!
);

For that reason, I think that suggestions that add the u flag would be more suitable for the require-unicode-regexp rule, as proposed in #15089.

@mathiasvr
Copy link
Contributor Author

mathiasvr commented Dec 2, 2021

@mdjermanovic I don't see a problem with allowing a suggestion to add the flag as changes might be necessary in either case. If wanting to allow this behaviour the rule should not be enabled. If not having suggestion for this rule, I think the require-unicode-regexp rule should also not have it since it will require manual fixes much more often.

@mdjermanovic
Copy link
Member

I don't see a problem with allowing a suggestion to add the flag as changes might be necessary in either case. If wanting to allow this behaviour the rule should not be enabled.

My concern about this suggestion is its scope, because it adds a flag that applies to the whole regular expression, not just to the reported character class. The flag correctly fixes a specific problem reported by this rule, but it can also have side effects on the rest of the pattern, so it could introduce new problems that the user may not be aware of.

I think the require-unicode-regexp rule should also not have it since it will require manual fixes much more often.

That's a good point, maybe we shouldn't provide suggestions in that rule either, except when we're sure that the suggested fix doesn't require any further manual fixes (i.e., the behavior with and without the flag is same).

@nzakas
Copy link
Member

nzakas commented Dec 3, 2021

My concern about this suggestion is its scope, because it adds a flag that applies to the whole regular expression, not just to the reported character class. The flag correctly fixes a specific problem reported by this rule, but it can also have side effects on the rest of the pattern, so it could introduce new problems that the user may not be aware of.

I’m not sure this is significant enough to change the path forward here. Suggestions were created specifically for situations where the fixes will potentially cause a change in behavior. In this case, we can’t limit the scope of the change, but it seems like the suggestion is still valid.

@mdjermanovic
Copy link
Member

Suggestions were created specifically for situations where the fixes will potentially cause a change in behavior. In this case, we can’t limit the scope of the change, but it seems like the suggestion is still valid.

This fix is way beyond the scope of the reported problem. The suggestion message should at least indicate that the fix may have side effects on behavior that is unrelated to the reported problem, and also that it can affect the validity of the regular expression (assuming that we're not checking if the regex will be valid with the u flag, as currently implemented).

@nzakas
Copy link
Member

nzakas commented Dec 15, 2021

It seems like we aren’t quite connecting on this. All suggestions have possible side effects, so it doesn’t make sense to mention that in a suggestion.

It sounds like you might be saying that the fix could create an invalid regular expression? Can you give an example of that so we can discuss?

@btmills you also approved this PR. What do you think?

@mdjermanovic
Copy link
Member

It seems like we aren’t quite connecting on this. All suggestions have possible side effects, so it doesn’t make sense to mention that in a suggestion.

By side effects, I mean changes in behavior not related to the reported problem. I wouldn't expect that from a suggestion.

Consider this example, user wants a regex that matches "👍" or any string with length=3.

const regex = /^(?:[👍]|.{3})$/;

regex.test("👍"); // false. This is a bug, the problem is [👍]

regex.test("abc"); // true

regex.test("👶👶👶"); // false

After the suggestion is applied:

const regex = /^(?:[👍]|.{3})$/u;

regex.test("👍"); // true, the bug is fixed

regex.test("abc"); // true

regex.test("👶👶👶"); // true! This is a side effect, .{3} now works differently

It sounds like you might be saying that the fix could create an invalid regular expression? Can you give an example of that so we can discuss?

I left one example in #15279 (review), here's a similar one:

/[👍]\R/; // syntactically valid

/[👍]\R/u; // syntax error

Additionally, the second version could become valid in the future (proposal-regexp-r-escape), but with completely different semantics than the original one.

@mathiasvr
Copy link
Contributor Author

mathiasvr commented Dec 15, 2021

@mdjermanovic I think this is why we now do it as a suggestion and not an automatic fix, so the side effects doesn't happen without explicitly applying it.

If a user enable this rule they should also want all the side effects as well, since they will ultimately have to disable it if not, no matter if it is auto-fixable, suggestion or just a warning.

@mdjermanovic
Copy link
Member

If a user enable this rule they should also want all the side effects as well, since they will ultimately have to disable it if not, no matter if it is auto-fixable, suggestion or just a warning.

Not necessarily. There are other ways to fix the problem, for example [👍abc] -> (?:👍|[abc]).

I think this is why we now do it as a suggestion and not an automatic fix, so the side effects doesn't happen without explicitly applying it

Yes, but there's no notice about side effects. In the /^(?:[👍]|.{3})$/ example, the reported problem is that [👍] doesn't match with "👍". After the suggestion is applied, matching "👍" is the expected change in the behavior (not a side effect). All other changes, like matching 6 string characters with .{3}, may not be expected and the user may not be aware of them (those are side effects).

@mathiasvr
Copy link
Contributor Author

The problem of not matching the emoji cannot be fixed without the u flag right? We agree that the suggestion of adding that flag may not be enough to fix the whole regex, but how will you avoid a lint error without disabling the rule or adding the flag?
I mean if the user wants the rule enabled they have to fix the regex somehow right?

@nzakas nzakas added the tsc agenda This issue will be discussed by ESLint's TSC at the next meeting label Dec 16, 2021
@nzakas
Copy link
Member

nzakas commented Dec 16, 2021

I don’t see the introduction of side effects as a problem here. As already discussed, suggestions are allowed to have side effects and it’s up to the end user to verify that their code still works. Autofixes cannot have side effects.

If there is a more narrow fix to suggest, we should also suggest that. Part of the benefit of suggestions is that we can specify multiple options.

We definitely shouldn’t suggest something that is syntactically invalid. I was under the impression that the earlier comment had already been addressed. If that’s not the case, we should fix that.

@nzakas
Copy link
Member

nzakas commented Dec 16, 2021

TSC Summary: This PR seeks to add suggestions to no-misleading-character-class, specifically adding the “u” flag if it would result in a valid regular expression. Both nzakas and btmills have approved the PR, mdjermanovic has concerns about the side effects of the suggestions. We do allow side effects in suggestions in core rules.

TSC Question: Do we want to change our expectations around suggestions? (https://eslint.org/docs/developer-guide/working-with-rules#providing-suggestions) Can we merge this?

@mdjermanovic
Copy link
Member

The problem of not matching the emoji cannot be fixed without the u flag right?

It can be fixed by moving the character out of the class.

const regex = /^(?:👍|.{3})$/;

regex.test("👍"); // true, the bug is fixed

regex.test("abc"); // true

regex.test("👶👶👶"); // still false

This fix changes behavior related to the reported problem, but without side effects on things that are not considered problems by this rule.

There is a similar plugin rule regexp/no-misleading-unicode-character and its opt-in autofix works this way, here are test cases.

@mdjermanovic mdjermanovic added accepted There is consensus among the team that this change meets the criteria for inclusion enhancement This change enhances an existing feature of ESLint and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion tsc agenda This issue will be discussed by ESLint's TSC at the next meeting labels Dec 20, 2021
@mdjermanovic
Copy link
Member

Per 16-December-2021 TSC Meeting, this PR is accepted. The conclusion is that suggestions are allowed to change the behavior of the code even when the said behavior isn't directly related to the reported problem, and that there is no need to add a note about possible side effects in the suggestion message. On the other hand, suggestions that produce syntactically invalid code should be avoided.

Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the previous comment, we should check if the regex would be syntactically valid with the u flag.

For example, rule shouldn't provide the u flag suggestion for the following regex literal:

/^[👍]\a$/;

For that purpose, we can use regexpp.RegExpValidator. Here's an example in prefer-regex-literals rule.

lib/rules/no-misleading-character-class.js Outdated Show resolved Hide resolved
Comment on lines +205 to +207
if (node.arguments.length === 1) {
return fixer.insertTextAfterRange(patternNode.range, ', "u"');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First argument may be parenthesised:

new RegExp(("[👍]")); // regex is /[👍]/

In that case, the suggested fix wouldn't work as intended:

new RegExp(("[👍]", "u")); // regex is /u/

@mdjermanovic
Copy link
Member

If there is a more narrow fix to suggest, we should also suggest that. Part of the benefit of suggestions is that we can specify multiple options.

This was also discussed in the TSC meeting, and we agreed that it isn't necessary for this PR. It's fine to implement only the u flag suggestion for the start, and (maybe) consider adding other suggestions later.

@nzakas
Copy link
Member

nzakas commented Feb 4, 2022

@mathiasvr are you still working on this?

Co-authored-by: Milos Djermanovic <milos.djermanovic@gmail.com>
@mathiasvr
Copy link
Contributor Author

@nzakas Sorry, I don't have time to look into finishing the requested changes at the moment.

@mdjermanovic
Copy link
Member

Continued in #15867

@eslint-github-bot eslint-github-bot bot locked and limited conversation to collaborators Nov 8, 2022
@eslint-github-bot eslint-github-bot bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion enhancement This change enhances an existing feature of ESLint feature This change adds a new feature to ESLint rule Relates to ESLint's core rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants