Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display a clear error on empty character literals '' #10197

Merged
merged 1 commit into from Feb 5, 2021

Conversation

gasche
Copy link
Member

@gasche gasche commented Feb 4, 2021

Before:

# '';;
Error: Syntax error

After:

# '';;
Error: The empty character literal '' is invalid. We expect a character
       between single quotes (possibly a space ' ') or a single quote for
       type variables 'a.

Before, this input would get correctly lexed into QUOTE QUOTE and then
fail in the parser with a generic "Syntax error" message. (Even if we
had better error messages in the parser, here the parser-level error
would not be very illuminating as ' is never the start of a valid
expression or structure item).

Fixes #10196.

Copy link
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems obviously fine - possible alternate message

Comment on lines 308 to 310
"The empty character literal '' is invalid. \
We expect a character between single quotes (possibly a space ' ') \
or a single quote for type variables 'a."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over here, using "we" when you are singular is generally reserved for the monarch 🙂 However, I think it's more consistent with the other messages to describe the properties of what is expected:

Suggested change
"The empty character literal '' is invalid. \
We expect a character between single quotes (possibly a space ' ') \
or a single quote for type variables 'a."
"Character expected between two single quotes (' ', for example). \
The empty character literal '' is invalid. \
A single quote is used for type variables 'a."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unconvinced by your proposition: I have doubts that "Character expected ... ." is easier to read than "We expect a character ... .", and your message splits the allowed forms in two halves, one before and one after the diagnostic, I find this confusing.

If the We is really a problem (we, the authors of the compiler), I can change to "Invalid empty character literal ''.", but I would rather keep this structure -- or something else. But then there are many occurrences of "we" in the compiler text, mostly in comments or in the manual, but also a handful in error messages. (git grep ' [Ww]e ' | wc -l returns 2440 on my copy of trunk.)

Copy link
Contributor

@dbuenzli dbuenzli Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the We is really a problem (we, the authors of the compiler),

Personally except for the notable exception of did you mean-like error messages, I dislike the anthropomorphisation of computer program error messages.

Also there are already errors patterns in the character space with which consistency could be maintained e.g. this use of "illegal" vs "invalid".

# "bl\a";;
Warning 14: illegal backslash escape in string.
- : string = "bl\\a"

Here would be my proposal:

illegal empty character literal ''. Did you mean ' ' ? 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of "we" in the manual or comments is entirely distinct from messages displayed by the tools. We may write the compiler, but on encountering an error the message comes from the compiler, not its authors. There may be other error messages elsewhere in the compiler which do this, but the other lexer error messages right around this one do not.

@dbuenzli's suggestion looks much better to me.

Copy link
Member Author

@gasche gasche Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I went with

Error: Illegal empty character literal; did you mean ' ' or a type variable 'a ?

Copy link
Member

@Octachron Octachron Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the hint part more distinct:

Error: Illegal empty character literal
Hint: Did you mean ' ' or a type variable 'a ?

(With the hope of one day using pretty printer for hints).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error: Illegal empty character literal; did you mean ' ' or a type variable 'a ?

Sorry for the extreme hci bikesheding @gasche.

Since the original report is about the difficulty to distinguish spaces with certain fonts I think it would be good to write down the empty '' as was in your your original message and in my proposal.

That way a direct, local, comparison between '' and ' ' can be made which should easily show the difference; the context switch to the actual code may make you loose the distinction.

Also I notice the existing format for reporting Did you mean is slightly different:

# let fab x = x;;
val fab : 'a -> 'a = <fun>
# fib;;
Error: Unbound value fib
Hint: Did you mean fab?

Not that I'm necessarily in favor of that one. However I'm slightly averse to have the did you mean after a semicolon, rather than a ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I don't mind the bikeshedding, especially given that you all had good reasons to suggest changes. I will follow the standard "did you mean" format suggested by @Octachron, but include the '' back. (I agree it's interesting to show it.)

@alainfrisch
Copy link
Contributor

alainfrisch commented Feb 4, 2021

Another confusing error is when we compile e.g. let c = 'é' with source code encoded in utf-8; I wonder if we could to something about it as well. We must be careful not to reject type 'foo' bar = ' (* wait for it... *) foo', though, which is, if not really ok, perfectly valid today.

@gasche
Copy link
Member Author

gasche commented Feb 4, 2021

We could have a regexp for just some sequences that are single-symbol-wide and likely culprits of this mistakes, but doing anything in the lexer would still prevent people from naming their type variable é' if they want to. I don't think we can do much without horrible hacks that are not worth it. (The best change would be to change the syntax of type ('a, 'b) list = ... into type list a b = ....)

@alainfrisch
Copy link
Contributor

but doing anything in the lexer would still prevent people from naming their type variable é' if they want to

'é' is neither a valid type variable nor a valid character literal when the source code is encoded in utf-8. And when the source is encoded in latin-1, it's parsed as a character literal; you need to add some whitespace or comment between the quote and the name ' é'. So when seeing 'é' (utf-8) in source code, it's likely because of some recoding of a source code previously in latin-1 (or just someone not knowing enough about OCaml characters vs Unicode); in both case, one can safely assume the intent was to write a character literal, and complain with a proper error message. Anyway, this is really not the topic of this PR...

@gasche gasche force-pushed the error-empty-character-literal branch 4 times, most recently from 908c5f2 to 2b7916a Compare February 4, 2021 20:03
Copy link
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message looks good to me and recognizing '' as an invalid character literal (or type variable name) seems sensible.

| Empty_character_literal ->
Location.errorf ~loc
"Illegal empty character literal ''\n\
Hint: Did you mean ' ' or a type variable 'a?"
Copy link
Member

@Octachron Octachron Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: the hint could be a suberror message:

Location.error ~loc ~sub:[Location.msg "Hint: Did you mean ' ' or a type variable 'a?"]
  "Illegal empty character literal ''"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gasche gasche force-pushed the error-empty-character-literal branch from 2b7916a to 49cd6c4 Compare February 5, 2021 10:48
@gasche
Copy link
Member Author

gasche commented Feb 5, 2021

I changed the error message slightly:

  # '';;
  Error: Illegal empty character literal ''
    Hint: Did you mean a space character literal ' ' or a type variable 'a?

@gasche gasche force-pushed the error-empty-character-literal branch from 49cd6c4 to 1e712d7 Compare February 5, 2021 10:50
@gasche
Copy link
Member Author

gasche commented Feb 5, 2021

Shedding my own bike:

# '';;
Error: Illegal empty character literal ''
  Hint: Did you mean ' ' (a space character literal) or 'a (a type variable)?

@Octachron
Copy link
Member

Octachron commented Feb 5, 2021

I liked the original error message better.
If you think that one need to be more explicit the second version is alright.
However, I am really not fond of the last version: parentheses only add visual noise in such a short message.

@gasche
Copy link
Member Author

gasche commented Feb 5, 2021

One thing I don't like about the short version is 'a?, having a punctuation mark immediately after code (that is not clearly delimited visually).

Before:
> # '';;
> Error: Syntax error

After:
> # '';;
> Error: Illegal empty character literal ''
>   Hint: Did you mean ' ' or a type variable 'a?

Before, this input would get correctly lexed into QUOTE QUOTE and then
fail in the parser with a generic "Syntax error" message. (Even if we
had better error messages in the parser, here the parser-level error
would not be very illuminating as ' is never the start of a valid
expression or structure item).

Fixes ocaml#10196.
@gasche gasche force-pushed the error-empty-character-literal branch from 1e712d7 to 966d10c Compare February 5, 2021 11:12
@gasche
Copy link
Member Author

gasche commented Feb 5, 2021

(Still went back to the short version. Maybe it's allright now?)

@Octachron
Copy link
Member

My approval still stands for this version. I sympathize with the punctuation collision issue, but I don't see a way to circumvent it (without an uniform code quoting standard in error messages).

@gasche gasche merged commit 243e39f into ocaml:trunk Feb 5, 2021
@gasche
Copy link
Member Author

gasche commented Feb 5, 2021

Merged, then. Thanks to everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could the error message when passing an empty char be better?
5 participants