Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unicode (Fixes #279) #539

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

iansw246
Copy link
Contributor

@iansw246 iansw246 commented Jul 18, 2023

Fixes #279 and other issues relating to invalid characters in identifiers. It will cover the functionality of #538

This allows any type name that is valid in ES5 by removing invalid characters. Valid characters were determined according to https://mathiasbynens.be/notes/javascript-identifiers.

It uses lodash.template, regenerate, and unicode-11.0.0 to generate the regex in scripts/generate-identifier-regex.ts

It uses code from this gist https://gist.github.com/mathiasbynens/6334847

There is no given license, so I'm not sure if we can use that.

If we can target ES6 or newer, this code can be significantly simplified with Unicode-aware regex.

iansw246 and others added 29 commits July 13, 2023 20:36
…th unicode characters, temporarily commented out gothic due to unresolved compatibility issue, updated snapshots
@iansw246 iansw246 changed the title Allow unicode Allow unicode (Fixes #279) Jul 18, 2023
@iansw246 iansw246 marked this pull request as ready for review July 21, 2023 21:08
Copy link
Owner

@bcherny bcherny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, and sorry for the delay!

I'm supportive of this change, but would love to see a simpler implementation. Can we just use https://www.npmjs.com/package/identifierfy (or a similar external library) instead?

cc @mathiasbynens, in case you have a recommendation for the right NPM module to use for coercing strings to valid JS/TS identifiers.

@iansw246
Copy link
Contributor Author

iansw246 commented Aug 30, 2023

Would the Unicode-aware regex in the latest commit work? That doesn't require any other packages nor the regex-generation scripts.

@iansw246
Copy link
Contributor Author

iansw246 commented Aug 30, 2023

I tried using identifierfy but faced some issues when toSafeString was called on the output of toSafeString. Also, it removes more than the minimum in some cases. Ultimately, using identifierfy should work with a few changes to calls of toSafeString

@bcherny
Copy link
Owner

bcherny commented Sep 23, 2023

I tried using identifierfy but faced some issues when toSafeString was called on the output of toSafeString. Also, it removes more than the minimum in some cases. Ultimately, using identifierfy should work with a few changes to calls of toSafeString

Great -- want to try doing this?

@iansw246
Copy link
Contributor Author

I've gotten Identifierfy to work

I've had to remove several calls to toSafeString() to prevent double conversion. This should be OK since the parser converts standaloneNames with generateName()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow unicode characters in type name
4 participants