New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Menhir's simplified error handling strategy #10095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a very sensible change to me, because it simplifies the codebase.
(Note: I don't believe that being able to use the code backend in the future is a plus for the OCaml compiler. I tried using the code backend before introducing the incremental loop in the compiler, but the produced parser was very large and much too slow to compile (40s on my machine). The inconvenience to all contributors to the compiler codebase made it a no-go, no matter the (moderate) runtime gains.)
I would be curious to have @let-def's opinion on this PR. In particular, would removing the incremental loop make his own experiments more cumbersome?
This is indeed a sensible change. The new loop is a clear improvement, the new strategy that does not discard stack frames on failure is the right behavior for producing error messages. This change does not affect my experiments. The simpler code just make things easier to reason about. |
@fpottier this is good to go, but could you please add a Changes entry? |
Done. Thanks for your reviews! |
This involves removing the custom parsing loop in parsing/parse.ml and instead using the parsing loop provided by MenhirLib. We use Menhir's so-called "simplified" error-handling strategy (which is new as of 20201216). This strategy differs from the "legacy" strategy in two ways: * It does not read one token too far (so the custom code in parsing/parse.ml can be removed). * If the current state cannot shift or reduce the error token, then it just dies, whereas the legacy strategy would pop an item off the stack and continue. The second point impacts some of the syntax error messages produced by the parser. Popping items off the stack meant forgetting part of the input that had just been read, and would lead the parser to produce messages that did not make sense to the user, because there was no way for the user to tell that the message was relative to an earlier point in the input. Here is an example: ``` $ more foo.ml (2 + ) $ ocamlc -c foo.ml File "foo.ml", line 1, characters 5-6: 1 | (2 + ) ^ Error: Syntax error: ')' expected File "foo.ml", line 1, characters 0-1: 1 | (2 + ) ^ This '(' might be unmatched ``` With the simplified parsing strategy, a simple "Syntax error" message is printed. This is arguably better: it may seem less informative, but it is actually less wrong and less confusing. It is a better basis for producing better syntax error messages in the future.
This does not change the behavior of the parser in any way. This allows us to stop using the incremental API and revert to using the traditional (monolithic) API. Thus, we save a few lines of uninteresting code and are able to remove the function [wrap_menhir]. This change could make it easier to switch between Menhir's code back-end and table back-end, since they both support the monolithic API. (However, at the time of writing, the code back-end does not yet support [--strategy simplified].)
61c9560
to
f3cf6be
Compare
This involves removing the custom parsing loop in
parsing/parse.ml
and instead using the entry point provided by Menhir's "monolithic" API. We use Menhir's "simplified" error-handling strategy (which is new as of 20201216). This strategy differs from the "legacy" strategy in two ways:It does not read one token too far (so the custom code in
parsing/parse.ml
can be removed).If the current state cannot shift or reduce the
error
token, then the parser just dies, whereas with the legacy strategy, the parser would pop an item off the stack and continue.The second point impacts some of the syntax error messages produced by the parser. Popping items off the stack meant forgetting part of the input that had just been read, and would lead the parser to produce messages that did not make sense to the user, because there was no way for the user to tell that the message was relative to an earlier point in the input. Here is an example:
With the simplified strategy, a simple
Syntax error
message is printed. This is arguably better: it may seem less informative, but it is actually less wrong and less confusing. It is a better basis for producing better syntax error messages in the future.The first commit is the most important one, and contains the change in behavior. The second commit performs further simplifications of the code but causes no change in behavior.