`prevent-abbreviations` - non-ASCII characters ignored in filenames #2308

MichaelBlm · 2024-04-04T17:07:01Z

Fixes #2292

MichaelBlm · 2024-04-04T17:10:22Z

Here is an explanation of the updated regular expression:

(?=\P{Ll}): This is a positive lookahead assertion. It matches a position where the next character is not a Unicode lowercase letter. \P{Ll} is a Unicode character class that negates the Ll (lowercase letter) class.
(?<=\P{L}): This is a positive lookbehind assertion. It matches a position where the previous character is not a Unicode letter (uppercase or lowercase). \P{L} negates the L (letter) Unicode character class.

fisker · 2024-04-04T17:24:08Z

The solution looks tricky to me, I prefer just ignore if non-ascii characters included.

sindresorhus · 2024-04-04T17:54:56Z

rules/prevent-abbreviations.js

@@ -116,7 +116,7 @@ const getNameReplacements = (name, options, limit = 3) => {
 	}

 	// Split words
-	const words = name.split(/(?=[^a-z])|(?<=[^A-Za-z])/).filter(Boolean);
+	const words = name.split(/(?=\P{Ll})|(?<=\P{L})/u).filter(Boolean);


Use the verbose names (Lowercase_Letter), for readability.

Updated to include the verbose names

We should make a rule for it 😄

ota-meshi/eslint-plugin-regexp#720

I'll work on it actually

Note: the proposal is on another repo.

sindresorhus · 2024-04-04T17:55:15Z

@fisker Tricky how? The solution looks fine to me.

MichaelBlm · 2024-04-04T18:40:41Z

My explanation of the regex wasn't that clear. \P{Letter} is equivalent to [^a-zA-Z] but it includes non-ASCII characters

fisker · 2024-04-04T19:14:07Z

@fisker Tricky how? The solution looks fine to me.

In different languages, the latin letters can be used with other characters together to be a word, so the non-letter characters not a word boundary.

For example, in Chinese, "B超" "T恤" are words. Similar in Japanese, I think.

MichaelBlm · 2024-04-04T20:39:30Z

@fisker Tricky how? The solution looks fine to me.

In different languages, the latin letters can be used with other characters together to be a word, so the non-letter characters not a word boundary.

For example, in Chinese, "B超" "T恤" are words. Similar in Japanese, I think.

Can you please provide me a valid/invalid test case to see if this fix doesn't already account for that?

fisker · 2024-04-06T09:05:44Z

I don't really have a case now.

fregante · 2024-04-06T16:31:13Z

@fisker you're saying that 龖.js will be counted as "one-letter long", but in Chinese it's a fully-formed "word", not an abbreviation. Hence "ignore if non-ascii characters included"

fisker · 2024-04-06T17:43:26Z

I don't think Chinese characters are "letter"s, but I can be wrong.

fregante · 2024-04-07T04:00:50Z

That's right, it's why quoted "one-letter word". The rule counts characters as "letters" but not all writing systems have letters.

https://en.wikipedia.org/wiki/Writing_system#Basic_terminology

The rule is based on the idea that no concept can be defined in 2 characters because in most languages words are longer than that.

fregante · 2024-04-07T04:08:21Z

Probably not helpful here, but: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter

MichaelBlm · 2024-04-07T04:28:53Z

I'll explore a solution using the segmenter. Looks promising

MichaelBlm · 2024-04-09T20:56:40Z

I added this code to the valid tests and it is passing with the current implementation? Is that the correct behavior?
{ code: 'foo();', filename: '龖.js', },

@fisker

@fisker

@fisker

@fisker

@fisker

@fisker

@fisker

@fisker

sindresorhus reviewed Apr 4, 2024

View reviewed changes

fix: updated regex to allow for unicode in filenames (sindresorhus#2292)

Loading
Loading status checks…

411f6e3

MichaelBlm force-pushed the prevent-abbreviations-non-ascii-characters-ignored branch from 6295fde to 411f6e3 Compare April 4, 2024 18:38

fisker approved these changes Apr 6, 2024

View reviewed changes

sindresorhus merged commit 28762c8 into sindresorhus:main Apr 6, 2024
18 checks passed

This was referenced May 31, 2024

[Snyk] Upgrade eslint-plugin-unicorn from 52.0.0 to 53.0.0 lwojcik/blizzapi#369

Merged

[Snyk] Upgrade eslint-plugin-unicorn from 52.0.0 to 53.0.0 lwojcik/starcraft2-api#310

Merged

This was referenced Jul 6, 2024

[Snyk] Upgrade eslint-plugin-unicorn from 53.0.0 to 54.0.0 lwojcik/starcraft2-api#311

Merged

[Snyk] Upgrade eslint-plugin-unicorn from 53.0.0 to 54.0.0 lwojcik/blizzapi#370

Merged

snyk-io bot mentioned this pull request Aug 12, 2024

[Snyk] Upgrade eslint-plugin-unicorn from 52.0.0 to 54.0.0 Playgirlkaybraz11/eslint#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

`prevent-abbreviations` - non-ASCII characters ignored in filenames #2308

`prevent-abbreviations` - non-ASCII characters ignored in filenames #2308

MichaelBlm commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 4, 2024

sindresorhus Apr 4, 2024

MichaelBlm Apr 4, 2024

fisker Apr 4, 2024

fisker Apr 4, 2024

MichaelBlm Apr 4, 2024

fisker Apr 6, 2024

sindresorhus commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 6, 2024

fregante commented Apr 6, 2024

fisker commented Apr 6, 2024

fregante commented Apr 7, 2024 •

edited

Loading

fregante commented Apr 7, 2024 •

edited

Loading

MichaelBlm commented Apr 7, 2024

MichaelBlm commented Apr 9, 2024

prevent-abbreviations - non-ASCII characters ignored in filenames #2308

prevent-abbreviations - non-ASCII characters ignored in filenames #2308

Conversation

MichaelBlm commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 4, 2024

sindresorhus Apr 4, 2024

Choose a reason for hiding this comment

MichaelBlm Apr 4, 2024

Choose a reason for hiding this comment

fisker Apr 4, 2024

Choose a reason for hiding this comment

fisker Apr 4, 2024

Choose a reason for hiding this comment

MichaelBlm Apr 4, 2024

Choose a reason for hiding this comment

fisker Apr 6, 2024

Choose a reason for hiding this comment

sindresorhus commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 4, 2024

MichaelBlm commented Apr 4, 2024

fisker commented Apr 6, 2024

fregante commented Apr 6, 2024

fisker commented Apr 6, 2024

fregante commented Apr 7, 2024 • edited Loading

fregante commented Apr 7, 2024 • edited Loading

MichaelBlm commented Apr 7, 2024

MichaelBlm commented Apr 9, 2024

`prevent-abbreviations` - non-ASCII characters ignored in filenames #2308

`prevent-abbreviations` - non-ASCII characters ignored in filenames #2308

fregante commented Apr 7, 2024 •

edited

Loading

fregante commented Apr 7, 2024 •

edited

Loading