Switch to Menhir's simplified error handling strategy #10095

fpottier · 2020-12-21T20:47:29Z

This involves removing the custom parsing loop in parsing/parse.ml and instead using the entry point provided by Menhir's "monolithic" API. We use Menhir's "simplified" error-handling strategy (which is new as of 20201216). This strategy differs from the "legacy" strategy in two ways:

It does not read one token too far (so the custom code in parsing/parse.ml can be removed).
If the current state cannot shift or reduce the error token, then the parser just dies, whereas with the legacy strategy, the parser would pop an item off the stack and continue.

The second point impacts some of the syntax error messages produced by the parser. Popping items off the stack meant forgetting part of the input that had just been read, and would lead the parser to produce messages that did not make sense to the user, because there was no way for the user to tell that the message was relative to an earlier point in the input. Here is an example:

$ more foo.ml
(2 + )
$ ocamlc -c foo.ml
File "foo.ml", line 1, characters 5-6:
1 | (2 + )
         ^
Error: Syntax error: ')' expected
File "foo.ml", line 1, characters 0-1:
1 | (2 + )
    ^
  This '(' might be unmatched

With the simplified strategy, a simple Syntax error message is printed. This is arguably better: it may seem less informative, but it is actually less wrong and less confusing. It is a better basis for producing better syntax error messages in the future.

The first commit is the most important one, and contains the change in behavior. The second commit performs further simplifications of the code but causes no change in behavior.

gasche

This looks like a very sensible change to me, because it simplifies the codebase.

(Note: I don't believe that being able to use the code backend in the future is a plus for the OCaml compiler. I tried using the code backend before introducing the incremental loop in the compiler, but the produced parser was very large and much too slow to compile (40s on my machine). The inconvenience to all contributors to the compiler codebase made it a no-go, no matter the (moderate) runtime gains.)

I would be curious to have @let-def's opinion on this PR. In particular, would removing the incremental loop make his own experiments more cumbersome?

let-def · 2021-01-05T16:13:23Z

This is indeed a sensible change. The new loop is a clear improvement, the new strategy that does not discard stack frames on failure is the right behavior for producing error messages.

This change does not affect my experiments. The simpler code just make things easier to reason about.

gasche · 2021-01-05T16:29:08Z

@fpottier this is good to go, but could you please add a Changes entry?

fpottier · 2021-01-05T16:39:02Z

Done. Thanks for your reviews!

This involves removing the custom parsing loop in parsing/parse.ml and instead using the parsing loop provided by MenhirLib. We use Menhir's so-called "simplified" error-handling strategy (which is new as of 20201216). This strategy differs from the "legacy" strategy in two ways: * It does not read one token too far (so the custom code in parsing/parse.ml can be removed). * If the current state cannot shift or reduce the error token, then it just dies, whereas the legacy strategy would pop an item off the stack and continue. The second point impacts some of the syntax error messages produced by the parser. Popping items off the stack meant forgetting part of the input that had just been read, and would lead the parser to produce messages that did not make sense to the user, because there was no way for the user to tell that the message was relative to an earlier point in the input. Here is an example: ``` $ more foo.ml (2 + ) $ ocamlc -c foo.ml File "foo.ml", line 1, characters 5-6: 1 | (2 + ) ^ Error: Syntax error: ')' expected File "foo.ml", line 1, characters 0-1: 1 | (2 + ) ^ This '(' might be unmatched ``` With the simplified parsing strategy, a simple "Syntax error" message is printed. This is arguably better: it may seem less informative, but it is actually less wrong and less confusing. It is a better basis for producing better syntax error messages in the future.

This does not change the behavior of the parser in any way. This allows us to stop using the incremental API and revert to using the traditional (monolithic) API. Thus, we save a few lines of uninteresting code and are able to remove the function [wrap_menhir]. This change could make it easier to switch between Menhir's code back-end and table back-end, since they both support the monolithic API. (However, at the time of writing, the code back-end does not yet support [--strategy simplified].)

gasche reviewed Dec 22, 2020

View reviewed changes

let-def approved these changes Jan 5, 2021

View reviewed changes

gasche added the merge-me label Jan 5, 2021

fpottier added 3 commits January 6, 2021 16:14

Add a Changes entry.

f3cf6be

gasche force-pushed the simplified-error-handling branch from 61c9560 to f3cf6be Compare January 6, 2021 15:15

gasche merged commit 451b27d into ocaml:trunk Jan 6, 2021

gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jun 24, 2021

update tests, the new output is an improvement: ocaml/ocaml#10095

0ab64d1

gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jul 6, 2021

update tests, the new output is an improvement: ocaml/ocaml#10095

63c8ff7

gpetiot added a commit to gpetiot/ocamlformat that referenced this pull request Jul 6, 2021

update tests, the new output is an improvement: ocaml/ocaml#10095

0e6708e

gpetiot mentioned this pull request Jul 6, 2021

Implement OCaml 4.13 features ocaml-ppx/ocamlformat#1680

Merged

5 tasks

copy mentioned this pull request Dec 6, 2021

Bad error syntax on tuple with trailing comma #7858

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Menhir's simplified error handling strategy #10095

Switch to Menhir's simplified error handling strategy #10095

fpottier commented Dec 21, 2020

gasche left a comment

let-def commented Jan 5, 2021

gasche commented Jan 5, 2021

fpottier commented Jan 5, 2021

Switch to Menhir's simplified error handling strategy #10095

Switch to Menhir's simplified error handling strategy #10095

Conversation

fpottier commented Dec 21, 2020

gasche left a comment

Choose a reason for hiding this comment

let-def commented Jan 5, 2021

gasche commented Jan 5, 2021

fpottier commented Jan 5, 2021