correctly handle empty strings in charmaps #174

iliazeus · 2023-03-21T10:07:55Z

Resolves #172

Resolves #95

Trott · 2023-03-21T19:15:49Z

This introduces a new way to remove characters, but there is already the remove option.

slugify('cat', {remove: /a/ig}) // 'ct'

Not necessarily saying we shouldn't do this, but we might want to pause before introducing multiple ways to do the same thing. If this is a more ergonomic way, consider deprecating remove? If it's a less ergonomic way, don't implement it?

iliazeus · 2023-03-22T03:04:15Z

@Trott they're not the same semantically.

The "mapping into empty string" is related to the transliteration rules. It is a way to specify in the locale and/or charmap that we don't want a specific letter to be translated into anything. This is already in the charmap for ь and Ь, for example; this should also be in the Russian locale for ъ and Ъ, and there are probably more cases.

The remove argument is a way for the user to specify that they don't want specific characters in the final result. It does not depend on locale and charmap, and is usually used for things like punctuation. The user will probably not be aware of all the transliteration rules, and they won't put things like ь in there when specifying a custom regex.

Trott · 2023-03-22T15:44:50Z

@Trott they're not the same semantically.

Thanks for the explanation. That makes sense and I understand now.

Trott · 2023-03-22T15:49:02Z

test/slugify.js

+    delete require.cache[require.resolve('../')]
+    slugify = require('../')


Not relevant to this PR, but slug has a .reset() method to make this more ergonomic. Maybe slugify could do something similar.

Trott · 2023-03-22T15:52:32Z

This is only useful for the twenty six letters of the Latin alphabet, right? For everything else, it already works as expected? Or am I wrong about that? (EDIT: Maybe it's useful for certain common symbols too?)

iliazeus · 2023-03-23T03:44:48Z

@Trott my own usecase was actually similar to the one described in #172: the Cyrillic characters ь and Ь are (correctly) mapped into empty strings in charmap.json. It should also be done for ъ and Ъ when the Russian locale is added (right now they use Bulgarian transliterations in charmap.json). Thing is, those are proper letters, but since (in Russian, at least) they don't have their own proper sound, they are usually just omitted when transliterating (but in Bulgarian, Ъ isn't!). This feature is useful for these kinds of letters. I don't know enough non-latin and non-cyrillic scripts to offer other examples, though :)

The test uses latin characters for the sake of simplicity.

Trott · 2023-03-23T04:55:37Z

@Trott my own usecase was actually similar to the one described in #172: the Cyrillic characters ь and Ь are (correctly) mapped into empty strings in charmap.json. It should also be done for ъ and Ъ when the Russian locale is added (right now they use Bulgarian transliterations in charmap.json). Thing is, those are proper letters, but since (in Russian, at least) they don't have their own proper sound, they are usually just omitted when transliterating (but in Bulgarian, Ъ isn't!). This feature is useful for these kinds of letters. I don't know enough non-latin and non-cyrillic scripts to offer other examples, though :)

The test uses latin characters for the sake of simplicity.

I'm sorry if I'm being foolish here and missing something obvious, but doesn't that already work without the change in this pull request?

const slugify = require('slugify');

const str = 'ъaъ';

console.log(slugify(str)); // 'uau'

slugify.extend({ 'ъ': ''})
console.log(slugify(str)) // 'a'

Maybe all that needs to happen is to add a Russian locale to config/locales.json?

iliazeus · 2023-03-23T05:24:15Z

@Trott sorry for being so unclear :)

The tricky part is: in your case, ъ is removed not by the extended charmap, but by the default remove regexp, which matches [^\w] (among others). Here is a quick test with a dummy remove:

> var slugify = require('slugify')
undefined
> slugify('ъяъ', { remove: /[]/ })
'uyau'
> slugify.extend({ 'ъ': '' })
undefined
> slugify('ъяъ', { remove: /[]/ })
'ъyaъ'

The order of steps for the current behavior is:

ъ is mapped by the charmap to ''
or-expressions on line 36 turn it into '' || 'ъ' === 'ъ'
it is then tested against remove

If someone that is not aware of language-specific rules specifies a custom remove, then mapping to empty string will stop working - it only did because of the default remove. This PR fixes that.

iliazeus · 2023-03-23T05:27:32Z

@Trott we could instead just document that the remove regexp must contain the [^\w] class, but this would still be counter-intuitive, since it looks like it's mostly meant for punctuation and other non-letter, non-locale-dependent characters. The documented example does actually not match [^\w] :)

iliazeus · 2023-03-23T05:34:29Z

I've updated the test to be a more relevant example of bugged behavior.

Trott · 2023-03-24T03:46:04Z

I think I see now. FWIW, the slug module already works this way.

const slug = require("slug");
const slugify = require("slugify");

const str = "ъяъ";

console.log(slug(str)); // 'uyau'
console.log(slugify(str)); // 'uyau'

slug.extend({ ъ: "" });
slugify.extend({ ъ: "" });

console.log(slug(str)); // 'ya'
console.log(slugify(str)); // 'ya'

console.log(slug("ъяъ", { remove: /[]/g })); // 'ya'
console.log(slugify("ъяъ", { remove: /[]/g })); // 'ъyaъ'

simov · 2023-03-26T12:34:28Z

Published in v1.6.6

@Trott I also added this 3f0b3f5

Trott reviewed Mar 22, 2023

View reviewed changes

correctly handle empty strings in charmaps

a8b2b1a

Trott approved these changes Mar 24, 2023

View reviewed changes

simov merged commit bbc56e2 into simov:master Mar 26, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correctly handle empty strings in charmaps #174

correctly handle empty strings in charmaps #174

iliazeus commented Mar 21, 2023 •

edited

Trott commented Mar 21, 2023

iliazeus commented Mar 22, 2023 •

edited

Trott commented Mar 22, 2023

Trott Mar 22, 2023

Trott commented Mar 22, 2023 •

edited

iliazeus commented Mar 23, 2023

Trott commented Mar 23, 2023

iliazeus commented Mar 23, 2023 •

edited

iliazeus commented Mar 23, 2023

iliazeus commented Mar 23, 2023

Trott commented Mar 24, 2023

simov commented Mar 26, 2023

		delete require.cache[require.resolve('../')]
		slugify = require('../')

correctly handle empty strings in charmaps #174

correctly handle empty strings in charmaps #174

Conversation

iliazeus commented Mar 21, 2023 • edited

Trott commented Mar 21, 2023

iliazeus commented Mar 22, 2023 • edited

Trott commented Mar 22, 2023

Trott Mar 22, 2023

Choose a reason for hiding this comment

Trott commented Mar 22, 2023 • edited

iliazeus commented Mar 23, 2023

Trott commented Mar 23, 2023

iliazeus commented Mar 23, 2023 • edited

iliazeus commented Mar 23, 2023

iliazeus commented Mar 23, 2023

Trott commented Mar 24, 2023

simov commented Mar 26, 2023

iliazeus commented Mar 21, 2023 •

edited

iliazeus commented Mar 22, 2023 •

edited

Trott commented Mar 22, 2023 •

edited

iliazeus commented Mar 23, 2023 •

edited