Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling of regexp character sorting #468

Merged
merged 12 commits into from Jan 26, 2020
Merged

Allow disabling of regexp character sorting #468

merged 12 commits into from Jan 26, 2020

Conversation

voxpelli
Copy link
Contributor

@voxpelli voxpelli commented Dec 3, 2019

Fixes #453

Improved by DmitrySoshnikov/regexp-tree#200

Does this look like a good addition?

docs/rules/regex-shorthand.md Outdated Show resolved Hide resolved
Type: `boolean`\
Default: `false`

Disables optimizations that affects the sorting of characters.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here needs to be clearer about what exactly will be sorted. "characters" is too ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sindresorhus There are two options as I see it:

  1. Get into the specifics of regexp-tree and be up front about the disabling of charClassClassrangesMerge, which the sorting is a needed side-effect of. This is probably too implementation specific and doesn't convey the purpose clearly.
  2. Clearly convey the intended effect by naming it sortCharacters/sortCharacterClasses (which the description of maybe could be spelled out in a somewhat more specific manner, but I have a hard time pinpointing that, maybe due to me not being the best at Regexp terminology).

I lean towards the second and I get the impression that you do as well?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this. Yes, 2. Could you also add an example? Not everyone knows what character classes are.

rules/regex-shorthand.js Outdated Show resolved Hide resolved
@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

Does charClassClassrangesMerge can actually disable sorting?
If it does, I suggestion use charClassClassrangesMerge directly.

Also we probably want use object like this

const defaultReplacements = {

so options will like

{
  charClassClassrangesMerge: false,
  charSurrogatePairToSingleUnicode: true,
}

@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

@voxpelli great work, indeed this prevent sorting

@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

@voxpelli would you mind if I refactor the option logic? we don't need that blacklist logic.

@voxpelli
Copy link
Contributor Author

voxpelli commented Dec 3, 2019

@fisker Because of reasons outlined in #468 (comment) I think the logic around regexp-tree should be kept as an internal implementation of this rule and not be exposed to the outside world.

That way this rule can more easily be adapted to future changes in regexp-tree and/or to future alternative modules.

Therefore I think that all we should do for now is to convey to regexp-tree that we want this very rule to sometimes be disabled, as made possible through DmitrySoshnikov/regexp-tree#200

Reason why I've implemented the blacklist feature myself here is to make this feature be easily upgraded once that PR has been merged and to have the intention in this code to be clear,

I do not like the idea of:

{
  charClassClassrangesMerge: false,
  charSurrogatePairToSingleUnicode: true,
}

That exposes way too much about the internal implementation and pushes too many decisions onto the user rather than abstract them in a way that makes sense from the purpose of this rule (which has a different target audience than the one of regexp-tree).

@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

@voxpelli Your comment make sense, but I still think let users have all the options is good.

Also this charClassClassrangesMerge not only prevent sorting, also disable ranges from merging as you can see

/[GgHhIiå.Z:a-f"0-8%A*ä]/

/["%*.0-8:AG-IZa-iäå]/

the a-fghi is merged to a-i, we disable charClassClassrangesMerge also disable merging, people may ask why

@voxpelli
Copy link
Contributor Author

voxpelli commented Dec 3, 2019

@fisker The fact that it disables is a reason why I want to expose the disabling of sorting, as that enables this rule to evolve to separate those additional fixes from the actual sorting, which a raw option of charClassClassrangesMerge totally dodges and instead downstreams to regexp-tree, where it may not belong as clearly.

Such a shortcoming can be mentioned in the documentation of a sortCharacterClasses option and a future version of this module can then fix it, when regexp-tree has evolved to such a degree that we can actually fix it.

@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

I have a regex /Error:?\s+([^\r\n]+)/i, yesterday when I update unicorn, I got it fixed to /error:?\s+([^\n\r]+)/i, in fact, my test string is Error:...., I put i there just in case I may change the message in feature for safty, if I can set charCaseInsensitiveLowerCaseTransform, I may already change my default, because I prefer not touching it.

I guess not only me, people may also need this. If it happends, we'll need to add another option again. Why not make it easier

@voxpelli
Copy link
Contributor Author

voxpelli commented Dec 3, 2019

I think we disagree on what makes it easier 🙂

I think exposing of internal implementation details makes the API more complicated while you consider it to make it easier.

I guess we can agree to disagree and have someone else, like @sindresorhus, chime in with his views?

@fisker
Copy link
Collaborator

fisker commented Dec 3, 2019

//cc @futpib @MrHen may also interested

@fisker fisker mentioned this pull request Dec 4, 2019
@sindresorhus
Copy link
Owner

I agree with @voxpelli, we shouldn't expose the internals of regexp-tree.

@sindresorhus
Copy link
Owner

I put i there just in case I may change the message in feature for safty

I would argue this is actually an argument for not exposing all the options. The rule is actually improving your code and you're trying to fight it. By having the i in there, even though it's not used, you're actually making your regex harder to read. I think your "just in case" need is better solved by an explanatory code comment.

I guess not only me, people may also need this. If it happends, we'll need to add another option again.

If there's a valid use-case, we can add a well-documented option for it. Exposing everything just leads to users having to waste too much time reading docs and configuring, and we would have to pin the dependency as every patch release of regexp-tree is a potential breaking change for us.

@sindresorhus
Copy link
Owner

@voxpelli The update is out: https://github.com/DmitrySoshnikov/regexp-tree/commits/master Can you update to use the blacklist?

@fisker
Copy link
Collaborator

fisker commented Dec 11, 2019

@sindresorhus @voxpelli I did the request changes

@voxpelli
Copy link
Contributor Author

Thanks @fisker! I was out of office for a few days, so unfortunately didn't have time to fix it :)

@voxpelli
Copy link
Contributor Author

@sindresorhus Is this ready to go you think?

@sindresorhus
Copy link
Owner

@voxpelli Did you see #468 (comment)?

@sindresorhus
Copy link
Owner

@voxpelli Bump :)

@voxpelli
Copy link
Contributor Author

Done @sindresorhus :) Sorry for me taking so long, I appreciate your bumps to remind me 👍

@voxpelli
Copy link
Contributor Author

Also rebased it 👍

@sindresorhus sindresorhus merged commit fb0268b into sindresorhus:master Jan 26, 2020
@sindresorhus
Copy link
Owner

Thank you 👍

@voxpelli voxpelli deleted the feature/enable-disabling-of-regexp-sorting branch February 1, 2020 15:13
@JounQin
Copy link
Contributor

JounQin commented Feb 11, 2020

@sindresorhus Should its default be false due to #472, #477, #499?

@sindresorhus
Copy link
Owner

I'm hoping that the author of regexp will fix those extra issues soon. If not, we could do this, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(v14): Make character class sorting in regex-shorthand optional or at least documented
4 participants