Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode in pcal.trans #911

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ahelwer
Copy link
Contributor

@ahelwer ahelwer commented Apr 15, 2024

These changes build on top of #909. They make the PlusCal translator correctly parse Unicode input. However, it will still always output generated TLA+ code in ASCII.

For testing, all PlusCal modules in the tlaplus/examples repo are converted to Unicode then translated.

Closes #928

@ahelwer ahelwer force-pushed the pcal-unicode branch 5 times, most recently from 215d8b0 to 00aaf05 Compare April 16, 2024 23:19
@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 16, 2024

@lemmy this workflow run is in an infinite loop btw; cancel if you have time! https://github.com/tlaplus/tlaplus/actions/runs/8713343718

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 17, 2024

@lemmy you canceled the wrong run!

@ahelwer ahelwer force-pushed the pcal-unicode branch 2 times, most recently from 2bf6b9a to 1fe58f2 Compare April 17, 2024 04:54
@ahelwer ahelwer marked this pull request as ready for review April 17, 2024 05:17
@ahelwer ahelwer force-pushed the pcal-unicode branch 3 times, most recently from 1b2829b to 6a79c3f Compare April 18, 2024 01:03
Signed-off-by: Andrew Helwer <2n8rn1w1f@mozmail.com>
Comment on lines +846 to +847
|| PeekAtAlgToken(1).equals("\\in")
|| PeekAtAlgToken(1).equals("∈"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and in a few other places) it looks like we're duplicating information from ConfigConstants.OperatorSynonyms. Is it at all practical to deduplicate that information? E.g. this check would be something like Operators.getOperator(PeekAtAlgToken(1)).equals(Operators.getOperator("\\in")), which covers both the \in and cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have that desire as well, a sort of central repository of all information about operators. I think it's viable but I would want test coverage to be better (currently considering how to improve test coverage of the pluscal code generally).

@Hi-Angel
Copy link
Contributor

Presumably, you can add a Fixes: https://github.com/tlaplus/tlaplus/issues/928 line

@Hi-Angel
Copy link
Contributor

@Calvin-L it seems all discussions are resolved and it's been approved. Can this perhaps be merged?

@Calvin-L
Copy link
Collaborator

Calvin-L commented Jun 7, 2024

@lemmy I'd like to get this PR merged; it looks like this is starting to affect users who are excited about unicode support. Are you planning to review this one or should I go ahead and merge it?

@lemmy
Copy link
Member

lemmy commented Jun 7, 2024

@lemmy I'd like to get this PR merged; it looks like this is starting to affect users who are excited about unicode support. Are you planning to review this one or should I go ahead and merge it?

I left two review comments above.

@Calvin-L
Copy link
Collaborator

They may have been lost; I don't see them :(

@@ -3445,7 +3447,7 @@ public static void GobbleThis(String str) throws ParseAlgorithmException
* error. The case is still tested for and a different error *
* message produced that might be more helpful. *
********************************************************************/
if ( str.equals(";") )
if ( str[0].equals(";") )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The varargs and the comment do not indicate that the order of elements in str matters, but with str[0] the order does matter.

@@ -1440,7 +1442,7 @@ public static AST.SingleAssign GetSingleAssign() throws ParseAlgorithmException
* PeekAtAlgToken(1), so LAT[0] contains the next token. *
******************************************************************/
result.lhs = GetLhs() ;
GobbleThis(":=") ;
GobbleThis(":=", "≔") ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me here on Github, ≔ only becomes discernible at zoom levels above 150%. How is ≔ rendered in relation to the other symbols in the different TLA+ IDEs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is ≔ rendered in relation to the other symbols in the different TLA+ IDEs?

Well, here's what it looks like in Emacs

image

For me here on Github, ≔ only becomes discernible at zoom levels above 150%.

I think it depends. I can clearly see right here in your comment the symbol without having to zoom in.

As a matter of fact, my native language has a symbol such as ё which you'd probably consider even worse. But I assure you we have no problem with that symbol 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

@lemmy lemmy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to know that Emacs uses a sensible symbol. However, since most TLA+ users use the Toolbox or the VSCode extension, this is where we need to ensure it renders correctly/decently out of the box.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since most TLA+ users use the Toolbox or the VSCode extension

Yeah, I confirm the character looks terrible in VScode and reordering fonts to render emoji with a different one simply doesn't work for some reason.

Anyway, it is a font problem.

this is where we need to ensure it renders correctly/decently out of the box

It's impossible. I mean, tla+toolbox kind of can try to set up fonts in such a way so this emoji renders correctly, but in the end you can't guarantee the font won't be missing on the end-user's system. Not to mention, for external editors such as VSCode you can't do anything at all.

Copy link
Member

@lemmy lemmy Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I confirm the character looks terrible in VScode and reordering fonts to render emoji with a different one simply doesn't work for some reason.

Anyway, it is a font problem.

this is where we need to ensure it renders correctly/decently out of the box

It's impossible. I mean, tla+toolbox kind of can try to set up fonts in such a way so this emoji renders correctly, but in the end you can't guarantee the font won't be missing on the end-user's system. Not to mention, for external editors such as VSCode you can't do anything at all.

Primarily, it is our problem if the user experience is poor. Declaring it impossible to improve is not acceptable. If a good user experience cannot be provided out of the box, we need to at least provide documentation to guide users on how to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in the monthly community meeting this morning I have opened a discussion in the google group about this: https://groups.google.com/g/tlaplus/c/P-GLlN07ARo/m/I5hGjR6AFAAJ

@lemmy
Copy link
Member

lemmy commented Jun 10, 2024

They may have been lost; I don't see them :(

I guess I just learned that "Pending" doesn't mean that a review comment is pending resolution, but that it is invisible to everybody but me. %-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Unicode operators break PlusCal
5 participants