Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

fpottier · 2020-12-14T14:30:58Z

This pull request proposes the following changes:

Update parser.mly so that every token carries an alias (an example of its concrete syntax).
Pass the flag --require-aliases to Menhir so that Menhir enforces this property in the future.
Update Makefile.menhir with new commands, make list-errors and make generate-parse-errors.
Adopt the convention in parser.mly that the parts of the grammar comprised between BEGIN AVOID and END AVOID are ignored by make generate-parse-errors. This is required, e.g., to avoid exercising certain productions whose semantic actions cause an error.
Use make generate-parse-errors to generate one new test file that contains 1086 sentences that cause a syntax error. Executing this new test is very fast. This gives us an overview of the syntax error messages that the parser can produce, and should help us avoid regressions if/when we change the error reporting mechanisms in the future.

This requires Menhir 20201214, which is in the proces of being released (waiting for opam).

nojb · 2020-12-14T14:38:26Z

Use make generate-parse-errors to generate 1077 test cases that cause a syntax error. It takes about 30 seconds to execute these 1077 tests. This gives us an overview of the syntax error messages that the compiler can produce, and will help us avoid regressions if/when we change the error reporting mechanisms in the future.

Just thinking out loud here, but would it make sense to combine all these tests into a single (or a few) expect tests? I am a bit afraid of the time these tests will take to run under Windows (where process creation is slower).

gasche

(We have already discussed this in an email thread with @let-def and @trefis.)

Two broad remarks:

I am worried about the generated tests taking too long to run, how much time does it currently add to the testsuite?
Have you considered generating all tests in one file, using the expect-test machinery, in the hope that it would reduce the testing time? (There is more IO plumbing, but only one instance of the compiler process is run.) Another option would be to build a tester from compiler-libs, to just call the parser.

Currently the test data seems fairly impractical in various ways (testing time, number of generated files etc.). It looks more like something that should remain on the side for now, outside the compiler codebase.

(We have discussed recently having a notion of "slow tests" that would be run an on irregular basis. We don't have anything like that, and "testsuite runtime" is critical for some compiler-development feedback loops.)

parsing/parser.mly

Makefile.menhir

fpottier · 2020-12-14T16:14:02Z

Would it make sense to combine all these tests into a single (or a few) expect tests?

Maybe it would, but I am not familiar with this machinery, which is why I did not think about it. Where can I read about it? If the tests were combined into a single expect test, would it be possible to run them in parallel?

Running the 1077 tests currently takes about 40 seconds on my machine. That said, the task is embarrassingly parallel; I am not sure why ocamltest does not parallelize the tests? It sounds as if that should be fixed.

Regarding the fact that there is a large number of small test files, I do not see why that might be a problem? On the contrary, it could make it easier to see what has changed when the test files are re-generated.

dra27 · 2020-12-14T16:24:12Z

We have make -C testsuite parallel (which can be made to work on Windows as well), but that doesn't solve the problem for CI, where the workers only have two cores (that said, we already do dependency generation forced much higher than -j2 on those workers).

For general Windows use, I'm somewhat more concerned by the 200% increase in the number of files. Is it possible to still have 1077 separate test cases, but use the expect machinery so that there is just the .ml and not the .reference file as well?

fpottier · 2020-12-14T16:28:41Z

We have make -C testsuite parallel

As far as I can see, this is ineffective, because the granularity is a single directory, and all 1077 test files reside in the same directory. One should perhaps modify make exec-ocamltest to use a parallel loop instead of a sequential loop.

For general Windows use, I'm somewhat more concerned by the 200% increase in the number of files. Is it possible to still have 1077 separate test cases, but use the expect machinery so that there is just the .ml and not the .reference file as well?

Is there documentation for the expect machinery? The file testsuite/HACKING.md says nothing about it. I produced my test cases by mimicking the existing tests that I could find.

xavierleroy · 2020-12-14T17:12:08Z

For "expect" tests, I don't know where the documentation is, but you can find many examples in the test suite, e.g. https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/typing-deprecated/deprecated.ml . If I understand correctly, you can start with empty [%%expect{| |}] attributes, and the first run of the test will create a .corrected file with the actual output.

Like @dra27 I fear that so many small files and so many small tests to run one after the other will be a big performance issue, especially for Windows. A single "expect" file with 1077 tests should be processed efficiently, however, especially if all tests are syntax errors and therefore don't undergo typing and compilation.

xavierleroy · 2020-12-14T17:24:21Z

To belabor the point:

Regarding the fact that there is a large number of small test files, I do not see why that might be a problem?

First, ocamltest will be started N=1077 times, then will fork N invocations (or perhaps 2N or 4N) of the OCaml compiler(s), creating N directories and mN intermediate files for some value of m. That's a lot of system activity, and under Windows + Cygwin such activity is expensive.

Second, any git activity will create or access N files, and again this is not cheap on some of the platforms we have to use.

fpottier · 2020-12-14T17:49:45Z

I get your points.

That said, as far as I can tell, based on looking at the file testsuite/tools/expect_test.ml, an expect test is executed by parsing the entire file in one go (using on the parser's use_file entry point) and only then splitting the file into phrases and executing each phrase (while handling runtime errors). Syntax errors are not handled at all. Thus, if there is even just one syntax error in the file, the entire test will fail abruptly.

One may need to rewrite testsuite/tools/expect_test.ml to first split the file using a purely textual method, then invoke the parser separately on each chunk, while handling syntax errors.

Also, there is a need to be able to indicate, for each chunk, whether it should be parsed as an implementation or as an interface.

Or, one might be better off writing another script, which would be analogous to expect_test.ml, but would be tailored to handling syntax errors.

gasche · 2020-12-14T17:54:56Z

Yes, writing your own tool is what I suggested with

Another option would be to build a tester from compiler-libs, to just call the parser.

I would find it natural to follow the toplevel convention of terminating phrases by ;;, using the lexer to skip all remaining tokens until ;; after each parse attempt. (Your tool guarantees that a valid parser will always fail on the last token I believe, but the tests generated now may be parsed differently in the future and the testing tool should support that.)

Note:

Also, there is a need to be able to indicate, for each chunk, whether it should be parsed as an implementation or as an interface.

You can embed interfaces into implementation with module type M = sig ... end, so that the tool only sees structure items.

fpottier · 2020-12-14T17:56:59Z

Another option would be to keep the scripts roughly as they are now but not commit the 2*1077 files to the repository and not make them part of the test suite. The scripts would be run only on demand and would allow someone who is working on the parser to check how the compiler's syntax error messages have evolved between commit A and commit B.

xavierleroy · 2020-12-15T09:43:45Z

I played a bit with the test files and the standard ocaml toplevel:

(for i in *.ml; do cat $i; echo ';;'; done) > impl
TERM=dumb ocaml < impl | less

The output looks like this:

# * * * * *     Line 7, characters 12-19:
7 | begin % and virtual
                ^^^^^^^
Error: Syntax error
# * * * * *     Line 7, characters 13-17:
7 | begin UIdent with
                 ^^^^
Error: Syntax error: 'end' expected
Line 7, characters 0-5:
7 | begin UIdent with
    ^^^^^
  This 'begin' might be unmatched

Line numbers restart at 1 for each phrase, which is good. It might be possible to get meaningful diffs between this output and a reference file.

xavierleroy · 2020-12-15T09:45:59Z

Concerning interfaces, we can wrap them in module type S = sig ... end, but the error messages may not be those that @fpottier wants to see:

#   * * * * *     Line 8, characters 15-17:
8 | class [@ and ] {<
                   ^^
Error: Syntax error: 'end' expected
Line 1, characters 16-19:
1 | module type S = sig
                    ^^^
  This 'sig' might be unmatched

gasche · 2020-12-15T09:49:52Z

Indeed! I get pretty good output with line directives and the -noprompt option.

#0 "first example"
1 + ;;
#0 "second example"
(2 ,);;

Observed output

$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

File "first example", line 1, characters 4-6:
Error: Syntax error
File "second example", line 1, characters 4-5:
Error: Syntax error: ')' expected
File "second example", line 1, characters 0-1:
  This '(' might be unmatched

gasche · 2020-12-15T09:52:36Z

We could even embed the example in the line directive...

#0 "1 +"
1 + ;;
#0 "(2, )"
(2 ,);;

$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

File "1 +", line 1, characters 4-6:
Error: Syntax error
File "(2, )", line 1, characters 4-5:
Error: Syntax error: ')' expected
File "(2, )", line 1, characters 0-1:
  This '(' might be unmatched

Edit: thinking about it, getting rid of the line directives is better.

1 + ;;
(2 ,);;

$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

Line 1, characters 4-6:
1 | 1 + ;;
        ^^
Error: Syntax error
Line 1, characters 4-5:
1 | (2 ,);;
        ^
Error: Syntax error: ')' expected
Line 1, characters 0-1:
1 | (2 ,);;
    ^
  This '(' might be unmatched

fpottier · 2020-12-15T09:59:07Z

Thanks for your suggestions. Regarding interfaces, wrapping them in module type S = sig ... end seems acceptable. If I remove the entry point interface from the grammar, I get 1075 reachable states instead of 1077, so we apparently do not lose much.

gasche · 2020-12-15T10:02:55Z

Now we be curious to hear about the checking time if all tests are in a single file passed to the ocaml toplevel.

xavierleroy · 2020-12-15T10:06:26Z

Now we be curious to hear about the checking time if all tests are in a single file passed to the ocaml toplevel.

About 0.3 s on my machine.

fpottier · 2020-12-15T10:08:44Z

About 0.3 s on my machine.

This seems to say something about the overhead imposed by ocamltest, but probably I don't know what I am talking about...

gasche · 2020-12-15T10:23:41Z

(As everyone pointed out already) just launching 1K ocaml or ocamlc invocations to read 1K files and produce 1K files will be slow, more or less slow depending on the operating system. It's not practical to generate a thousand test files, so if we believe that the tests could be valuable to have in the compiler you need to find another way, such as the ocaml approach just suggested.

fpottier · 2020-12-15T18:07:25Z

I have uploaded a new version of the PR. Abusing the REPL is a bit of a hack, but (mostly) works. The performance problems should now be avoided. Let me know what you think...

xavierleroy · 2020-12-16T17:41:17Z

Technically, the PR looks good and runs fast :-) . I'm not 100% sure I understand what is being tested, however. Which mistakes is this test going to catch, exactly?

fpottier · 2020-12-16T19:13:53Z

As of now, it doesn't catch any mistakes; it just reflects the current state of the syntax error messages that the parser can produce. In the future, when someone changes the parser, this test allows seeing immediately how the error messages are (intentionally or unintentionally) affected. For instance, I have another PR in preparation where I change the error handling strategy, and the existence of this machinery allows me to quickly assess its effect.

trefis · 2020-12-16T21:03:21Z

FTR: I'm in favor of this kind of tests, if they are not too costly (which they don't seem to be with this new implementation).

gasche · 2020-12-17T07:36:57Z

Our integration tests complains about procedural things that should be fixed:

a Changes entry is missing
check-typo detects that many lines are too long (bad!)

Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: Makefile.menhir
./Makefile.menhir:203.81: [long-line] line is over 80 columns
./Makefile.menhir:277.81: [long-line] line is over 80 columns
./Makefile.menhir:292.81: [long-line] line is over 80 columns
./Makefile.menhir:293.81: [long-line] line is over 80 columns
./Makefile.menhir:294.81: [long-line] line is over 80 columns
./Makefile.menhir:294.133: [very-long-line] line is over 132 columns
./Makefile.menhir:295.81: [long-line] line is over 80 columns
./Makefile.menhir:295.133: [very-long-line] line is over 132 columns

Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: parsing/parser.mly
./parsing/parser.mly:722.81: [long-line] line is over 80 columns
./parsing/parser.mly:746.81: [long-line] line is over 80 columns

Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: testsuite/tests/generated-parse-errors/errors.ml
./testsuite/tests/generated-parse-errors/errors.ml:916.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:919.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:988.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:1087.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:1690.133: [very-long-line] line is over 132 columns

The long lines in Makefile.menhir and parser.mly should be broken. For the generated errors, I suppose that the errors come from generated text (and not human inputs; there is no human input in the file for now), and that you may not have a line-wrapping approach for those. It is possible to opt-out of long-line checking for a specific file, see the line about testsuite/tests/asmgen/immediates.cmm in the .gitattributes file at the root of the repository.

gasche · 2020-12-17T07:38:17Z

I would be in favor of merging the PR, once the procedural issues are fixed. It is not very exciting yet, but there are no downsides either (thanks to the various changes that went in since submission), and if it can motivate @fpottier to keep working on error messages it can have a large impact.

The former is a subset of the latter, and suffices when running Menhir to perform an analysis of the grammar. This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc have not been built yet.

…e-aliases. The flag --require-aliases makes sure that the property that every token has an alias will be preserved in the future. This requires Menhir 20201214.

This rule runs Menhir's reachability analysis, which produces a list of all states where a syntax error can be detected (and a corresponding list of of erroneous sentences). This data is stored in parsing/parser.auto.messages. All text between BEGIN AVOID and END AVOID is removed from the grammar before the analysis is run. This can be used to filter out productions and declarations that the analysis should ignore.

This rule turns the error sentences stored in parsing/parser.auto.messages into concrete .ml files, which can be used as tests. One file per sentence is created. The file name is derived from the sentence. The test files are placed in the directory testsuite/tests/generated-parse-errors.

This prevents [make list-errors] from generating sentences that exploit this production. Indeed, there is a risk of generating sentences that cause syntax errors, due to the auxiliary function [addlb], which rejects certain puns.

…s.ml. This file was produced by [make generate-parse-errors]. This file contains: 1072 sentences whose start symbol is implementation. 5 sentences whose start symbol is use_file. 9 sentences whose start symbol is toplevel_phrase. The parser's output can be described as follows: 1086 syntax errors reported. 721 syntax errors without explanation. 365 syntax errors with an indication of what was expected. 307 syntax errors with an indication of an unmatched delimiter.

fpottier · 2020-12-17T11:52:10Z

Thanks Gabriel, I have fixed the issues that you raised.

gasche · 2020-12-17T13:20:08Z

The check-typo test is still failing due to very long lines in the test file, and your rebased commits do not include any .gitattributes change, is this intended?

fpottier · 2020-12-17T13:37:34Z

Sorry, should be fixed now.

xavierleroy

CI is happy, and there's agreement that this new test is useful, so let's merge.

xavierleroy · 2020-12-21T18:47:06Z

For the record: Menhir versions 20201214 and 20201216 have landed on OPAM.

…ges (plus tooling) (ocaml#10086) * Improve [make clean-menhir] to remove parser.{automaton,conflicts}. * Distinguish MENHIRBASICFLAGS and MENHIRFLAGS. The former is a subset of the latter, and suffices when running Menhir to perform an analysis of the grammar. This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc have not been built yet. * Define an alias (i.e., concrete syntax) for every token. Add --require-aliases. The flag --require-aliases makes sure that the property that every token has an alias will be preserved in the future. This requires Menhir 20201214. * Add [make list-parse-errors]. This rule runs Menhir's reachability analysis, which produces a list of all states where a syntax error can be detected (and a corresponding list of of erroneous sentences). This data is stored in parsing/parser.auto.messages. All text between BEGIN AVOID and END AVOID is removed from the grammar before the analysis is run. This can be used to filter out productions and declarations that the analysis should ignore. * Add [make generate-parse-errors]. This rule turns the error sentences stored in parsing/parser.auto.messages into concrete .ml files, which can be used as tests. One file per sentence is created. The file name is derived from the sentence. The test files are placed in the directory testsuite/tests/generated-parse-errors. * Mark the three productions that use [not_expecting] with [AVOID]. * Mark the production that allows puns with [AVOID]. This prevents [make list-parse-errors] from generating sentences that exploit this production. Indeed, there is a risk of generating sentences that cause syntax errors, due to the auxiliary function [addlb], which rejects certain puns. * Mark some of the start symbols with [AVOID]. * Add one new test file in testsuite/tests/generated-parse-errors/errors.ml. This file was produced by [make generate-parse-errors]. This file contains: 1072 sentences whose start symbol is implementation. 5 sentences whose start symbol is use_file. 9 sentences whose start symbol is toplevel_phrase. The parser's output can be described as follows: 1086 syntax errors reported. 721 syntax errors without explanation. 365 syntax errors with an indication of what was expected. 307 syntax errors with an indication of an unmatched delimiter.

gasche reviewed Dec 14, 2020

View reviewed changes

parsing/parser.mly Outdated Show resolved Hide resolved

parsing/parser.mly Outdated Show resolved Hide resolved

Makefile.menhir Outdated Show resolved Hide resolved

Makefile.menhir Show resolved Hide resolved

fpottier force-pushed the parse-errors-generated branch 3 times, most recently from 4acc56a to 7da7acb Compare December 14, 2020 16:06

fpottier force-pushed the parse-errors-generated branch from 7da7acb to d4dd6e6 Compare December 15, 2020 17:59

fpottier changed the title ~~Add 1077 test cases for syntax errors (plus tooling)~~ Add one new test file containing 1086 tests of the syntax error messages (plus tooling) Dec 15, 2020

fpottier force-pushed the parse-errors-generated branch from d4dd6e6 to f45bdcd Compare December 15, 2020 18:03

fpottier added 12 commits December 17, 2020 11:35

Fix a typo in a comment.

381081e

Improve [make clean-menhir] to remove parser.{automaton,conflicts}.

7a44158

Distinguish MENHIRBASICFLAGS and MENHIRFLAGS.

0b85c5b

The former is a subset of the latter, and suffices when running Menhir to perform an analysis of the grammar. This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc have not been built yet.

Define an alias (i.e., concrete syntax) for every token. Add --requir…

13f6b14

…e-aliases. The flag --require-aliases makes sure that the property that every token has an alias will be preserved in the future. This requires Menhir 20201214.

Mark the three productions that use [not_expecting] with [AVOID].

092dcb2

Mark the production that allows puns with [AVOID].

5043b8b

This prevents [make list-errors] from generating sentences that exploit this production. Indeed, there is a risk of generating sentences that cause syntax errors, due to the auxiliary function [addlb], which rejects certain puns.

Mark some of the start symbols with [AVOID].

ae8fa65

For uniformity, rename [make list-errors] to [make list-parse-errors].

ee6d974

Changes.

f016aff

fpottier force-pushed the parse-errors-generated branch from b0a4993 to f016aff Compare December 17, 2020 11:47

Update .gitattributes to ignore very long lines in the generated files.

6a3e617

xavierleroy approved these changes Dec 21, 2020

View reviewed changes

Update Changes

01d6c4e

xavierleroy merged commit b71521b into ocaml:trunk Dec 21, 2020

fpottier deleted the parse-errors-generated branch December 21, 2020 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

fpottier commented Dec 14, 2020 •

edited

nojb commented Dec 14, 2020

gasche left a comment

fpottier commented Dec 14, 2020 •

edited

dra27 commented Dec 14, 2020

fpottier commented Dec 14, 2020

xavierleroy commented Dec 14, 2020

xavierleroy commented Dec 14, 2020

fpottier commented Dec 14, 2020

gasche commented Dec 14, 2020

fpottier commented Dec 14, 2020

xavierleroy commented Dec 15, 2020

xavierleroy commented Dec 15, 2020

gasche commented Dec 15, 2020 •

edited

gasche commented Dec 15, 2020 •

edited

fpottier commented Dec 15, 2020

gasche commented Dec 15, 2020

xavierleroy commented Dec 15, 2020

fpottier commented Dec 15, 2020

gasche commented Dec 15, 2020

fpottier commented Dec 15, 2020

xavierleroy commented Dec 16, 2020

fpottier commented Dec 16, 2020

trefis commented Dec 16, 2020

gasche commented Dec 17, 2020

gasche commented Dec 17, 2020

fpottier commented Dec 17, 2020

gasche commented Dec 17, 2020

fpottier commented Dec 17, 2020

xavierleroy left a comment

xavierleroy commented Dec 21, 2020

Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

Conversation

fpottier commented Dec 14, 2020 • edited

nojb commented Dec 14, 2020

gasche left a comment

Choose a reason for hiding this comment

fpottier commented Dec 14, 2020 • edited

dra27 commented Dec 14, 2020

fpottier commented Dec 14, 2020

xavierleroy commented Dec 14, 2020

xavierleroy commented Dec 14, 2020

fpottier commented Dec 14, 2020

gasche commented Dec 14, 2020

fpottier commented Dec 14, 2020

xavierleroy commented Dec 15, 2020

xavierleroy commented Dec 15, 2020

gasche commented Dec 15, 2020 • edited

gasche commented Dec 15, 2020 • edited

fpottier commented Dec 15, 2020

gasche commented Dec 15, 2020

xavierleroy commented Dec 15, 2020

fpottier commented Dec 15, 2020

gasche commented Dec 15, 2020

fpottier commented Dec 15, 2020

xavierleroy commented Dec 16, 2020

fpottier commented Dec 16, 2020

trefis commented Dec 16, 2020

gasche commented Dec 17, 2020

gasche commented Dec 17, 2020

fpottier commented Dec 17, 2020

gasche commented Dec 17, 2020

fpottier commented Dec 17, 2020

xavierleroy left a comment

Choose a reason for hiding this comment

xavierleroy commented Dec 21, 2020

fpottier commented Dec 14, 2020 •

edited

fpottier commented Dec 14, 2020 •

edited

gasche commented Dec 15, 2020 •

edited

gasche commented Dec 15, 2020 •

edited