Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Menhir's simplified error handling strategy #10095

Merged
merged 3 commits into from Jan 6, 2021

Conversation

fpottier
Copy link
Contributor

This involves removing the custom parsing loop in parsing/parse.ml and instead using the entry point provided by Menhir's "monolithic" API. We use Menhir's "simplified" error-handling strategy (which is new as of 20201216). This strategy differs from the "legacy" strategy in two ways:

  • It does not read one token too far (so the custom code in parsing/parse.ml can be removed).

  • If the current state cannot shift or reduce the error token, then the parser just dies, whereas with the legacy strategy, the parser would pop an item off the stack and continue.

The second point impacts some of the syntax error messages produced by the parser. Popping items off the stack meant forgetting part of the input that had just been read, and would lead the parser to produce messages that did not make sense to the user, because there was no way for the user to tell that the message was relative to an earlier point in the input. Here is an example:

$ more foo.ml
(2 + )
$ ocamlc -c foo.ml
File "foo.ml", line 1, characters 5-6:
1 | (2 + )
         ^
Error: Syntax error: ')' expected
File "foo.ml", line 1, characters 0-1:
1 | (2 + )
    ^
  This '(' might be unmatched

With the simplified strategy, a simple Syntax error message is printed. This is arguably better: it may seem less informative, but it is actually less wrong and less confusing. It is a better basis for producing better syntax error messages in the future.

The first commit is the most important one, and contains the change in behavior. The second commit performs further simplifications of the code but causes no change in behavior.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very sensible change to me, because it simplifies the codebase.

(Note: I don't believe that being able to use the code backend in the future is a plus for the OCaml compiler. I tried using the code backend before introducing the incremental loop in the compiler, but the produced parser was very large and much too slow to compile (40s on my machine). The inconvenience to all contributors to the compiler codebase made it a no-go, no matter the (moderate) runtime gains.)

I would be curious to have @let-def's opinion on this PR. In particular, would removing the incremental loop make his own experiments more cumbersome?

@let-def
Copy link
Contributor

let-def commented Jan 5, 2021

This is indeed a sensible change. The new loop is a clear improvement, the new strategy that does not discard stack frames on failure is the right behavior for producing error messages.

This change does not affect my experiments. The simpler code just make things easier to reason about.

@gasche
Copy link
Member

gasche commented Jan 5, 2021

@fpottier this is good to go, but could you please add a Changes entry?

@fpottier
Copy link
Contributor Author

fpottier commented Jan 5, 2021

Done. Thanks for your reviews!

@gasche gasche added the merge-me label Jan 5, 2021
This involves removing the custom parsing loop in parsing/parse.ml
and instead using the parsing loop provided by MenhirLib. We use
Menhir's so-called "simplified" error-handling strategy (which is
new as of 20201216). This strategy differs from the "legacy"
strategy in two ways:

* It does not read one token too far (so the custom code in
  parsing/parse.ml can be removed).

* If the current state cannot shift or reduce the error token,
  then it just dies, whereas the legacy strategy would pop an
  item off the stack and continue.

The second point impacts some of the syntax error messages produced
by the parser. Popping items off the stack meant forgetting part of
the input that had just been read, and would lead the parser to
produce messages that did not make sense to the user, because there
was no way for the user to tell that the message was relative to an
earlier point in the input. Here is an example:

```
$ more foo.ml
(2 + )
$ ocamlc -c foo.ml
File "foo.ml", line 1, characters 5-6:
1 | (2 + )
         ^
Error: Syntax error: ')' expected
File "foo.ml", line 1, characters 0-1:
1 | (2 + )
    ^
  This '(' might be unmatched
```

With the simplified parsing strategy, a simple "Syntax error" message
is printed. This is arguably better: it may seem less informative,
but it is actually less wrong and less confusing. It is a better basis
for producing better syntax error messages in the future.
This does not change the behavior of the parser in any way.

This allows us to stop using the incremental API and revert
to using the traditional (monolithic) API.

Thus, we save a few lines of uninteresting code and are able
to remove the function [wrap_menhir].

This change could make it easier to switch between Menhir's
code back-end and table back-end, since they both support
the monolithic API. (However, at the time of writing, the
code back-end does not yet support [--strategy simplified].)
@gasche gasche merged commit 451b27d into ocaml:trunk Jan 6, 2021
gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jun 24, 2021
gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jul 6, 2021
gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants