Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086

Merged
merged 14 commits into from Dec 21, 2020

Conversation

fpottier
Copy link
Contributor

@fpottier fpottier commented Dec 14, 2020

This pull request proposes the following changes:

  • Update parser.mly so that every token carries an alias (an example of its concrete syntax).
  • Pass the flag --require-aliases to Menhir so that Menhir enforces this property in the future.
  • Update Makefile.menhir with new commands, make list-errors and make generate-parse-errors.
  • Adopt the convention in parser.mly that the parts of the grammar comprised between BEGIN AVOID and END AVOID are ignored by make generate-parse-errors. This is required, e.g., to avoid exercising certain productions whose semantic actions cause an error.
  • Use make generate-parse-errors to generate one new test file that contains 1086 sentences that cause a syntax error. Executing this new test is very fast. This gives us an overview of the syntax error messages that the parser can produce, and should help us avoid regressions if/when we change the error reporting mechanisms in the future.

This requires Menhir 20201214, which is in the proces of being released (waiting for opam).

@nojb
Copy link
Contributor

nojb commented Dec 14, 2020

  • Use make generate-parse-errors to generate 1077 test cases that cause a syntax error. It takes about 30 seconds to execute these 1077 tests. This gives us an overview of the syntax error messages that the compiler can produce, and will help us avoid regressions if/when we change the error reporting mechanisms in the future.

Just thinking out loud here, but would it make sense to combine all these tests into a single (or a few) expect tests? I am a bit afraid of the time these tests will take to run under Windows (where process creation is slower).

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We have already discussed this in an email thread with @let-def and @trefis.)

Two broad remarks:

  • I am worried about the generated tests taking too long to run, how much time does it currently add to the testsuite?
  • Have you considered generating all tests in one file, using the expect-test machinery, in the hope that it would reduce the testing time? (There is more IO plumbing, but only one instance of the compiler process is run.) Another option would be to build a tester from compiler-libs, to just call the parser.

Currently the test data seems fairly impractical in various ways (testing time, number of generated files etc.). It looks more like something that should remain on the side for now, outside the compiler codebase.

(We have discussed recently having a notion of "slow tests" that would be run an on irregular basis. We don't have anything like that, and "testsuite runtime" is critical for some compiler-development feedback loops.)

parsing/parser.mly Outdated Show resolved Hide resolved
parsing/parser.mly Outdated Show resolved Hide resolved
Makefile.menhir Outdated Show resolved Hide resolved
Makefile.menhir Show resolved Hide resolved
@fpottier fpottier force-pushed the parse-errors-generated branch 3 times, most recently from 4acc56a to 7da7acb Compare December 14, 2020 16:06
@fpottier
Copy link
Contributor Author

fpottier commented Dec 14, 2020

Would it make sense to combine all these tests into a single (or a few) expect tests?

Maybe it would, but I am not familiar with this machinery, which is why I did not think about it. Where can I read about it? If the tests were combined into a single expect test, would it be possible to run them in parallel?

Running the 1077 tests currently takes about 40 seconds on my machine. That said, the task is embarrassingly parallel; I am not sure why ocamltest does not parallelize the tests? It sounds as if that should be fixed.

Regarding the fact that there is a large number of small test files, I do not see why that might be a problem? On the contrary, it could make it easier to see what has changed when the test files are re-generated.

@dra27
Copy link
Member

dra27 commented Dec 14, 2020

We have make -C testsuite parallel (which can be made to work on Windows as well), but that doesn't solve the problem for CI, where the workers only have two cores (that said, we already do dependency generation forced much higher than -j2 on those workers).

For general Windows use, I'm somewhat more concerned by the 200% increase in the number of files. Is it possible to still have 1077 separate test cases, but use the expect machinery so that there is just the .ml and not the .reference file as well?

@fpottier
Copy link
Contributor Author

We have make -C testsuite parallel

As far as I can see, this is ineffective, because the granularity is a single directory, and all 1077 test files reside in the same directory. One should perhaps modify make exec-ocamltest to use a parallel loop instead of a sequential loop.

For general Windows use, I'm somewhat more concerned by the 200% increase in the number of files. Is it possible to still have 1077 separate test cases, but use the expect machinery so that there is just the .ml and not the .reference file as well?

Is there documentation for the expect machinery? The file testsuite/HACKING.md says nothing about it. I produced my test cases by mimicking the existing tests that I could find.

@xavierleroy
Copy link
Contributor

For "expect" tests, I don't know where the documentation is, but you can find many examples in the test suite, e.g. https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/typing-deprecated/deprecated.ml . If I understand correctly, you can start with empty [%%expect{| |}] attributes, and the first run of the test will create a .corrected file with the actual output.

Like @dra27 I fear that so many small files and so many small tests to run one after the other will be a big performance issue, especially for Windows. A single "expect" file with 1077 tests should be processed efficiently, however, especially if all tests are syntax errors and therefore don't undergo typing and compilation.

@xavierleroy
Copy link
Contributor

To belabor the point:

Regarding the fact that there is a large number of small test files, I do not see why that might be a problem?

First, ocamltest will be started N=1077 times, then will fork N invocations (or perhaps 2N or 4N) of the OCaml compiler(s), creating N directories and mN intermediate files for some value of m. That's a lot of system activity, and under Windows + Cygwin such activity is expensive.

Second, any git activity will create or access N files, and again this is not cheap on some of the platforms we have to use.

@fpottier
Copy link
Contributor Author

I get your points.

That said, as far as I can tell, based on looking at the file testsuite/tools/expect_test.ml, an expect test is executed by parsing the entire file in one go (using on the parser's use_file entry point) and only then splitting the file into phrases and executing each phrase (while handling runtime errors). Syntax errors are not handled at all. Thus, if there is even just one syntax error in the file, the entire test will fail abruptly.

One may need to rewrite testsuite/tools/expect_test.ml to first split the file using a purely textual method, then invoke the parser separately on each chunk, while handling syntax errors.

Also, there is a need to be able to indicate, for each chunk, whether it should be parsed as an implementation or as an interface.

Or, one might be better off writing another script, which would be analogous to expect_test.ml, but would be tailored to handling syntax errors.

@gasche
Copy link
Member

gasche commented Dec 14, 2020

Yes, writing your own tool is what I suggested with

Another option would be to build a tester from compiler-libs, to just call the parser.

I would find it natural to follow the toplevel convention of terminating phrases by ;;, using the lexer to skip all remaining tokens until ;; after each parse attempt. (Your tool guarantees that a valid parser will always fail on the last token I believe, but the tests generated now may be parsed differently in the future and the testing tool should support that.)

Note:

Also, there is a need to be able to indicate, for each chunk, whether it should be parsed as an implementation or as an interface.

You can embed interfaces into implementation with module type M = sig ... end, so that the tool only sees structure items.

@fpottier
Copy link
Contributor Author

Another option would be to keep the scripts roughly as they are now but not commit the 2*1077 files to the repository and not make them part of the test suite. The scripts would be run only on demand and would allow someone who is working on the parser to check how the compiler's syntax error messages have evolved between commit A and commit B.

@xavierleroy
Copy link
Contributor

I played a bit with the test files and the standard ocaml toplevel:

(for i in *.ml; do cat $i; echo ';;'; done) > impl
TERM=dumb ocaml < impl | less

The output looks like this:

# * * * * *     Line 7, characters 12-19:
7 | begin % and virtual
                ^^^^^^^
Error: Syntax error
# * * * * *     Line 7, characters 13-17:
7 | begin UIdent with
                 ^^^^
Error: Syntax error: 'end' expected
Line 7, characters 0-5:
7 | begin UIdent with
    ^^^^^
  This 'begin' might be unmatched

Line numbers restart at 1 for each phrase, which is good. It might be possible to get meaningful diffs between this output and a reference file.

@xavierleroy
Copy link
Contributor

Concerning interfaces, we can wrap them in module type S = sig ... end, but the error messages may not be those that @fpottier wants to see:

#   * * * * *     Line 8, characters 15-17:
8 | class [@ and ] {<
                   ^^
Error: Syntax error: 'end' expected
Line 1, characters 16-19:
1 | module type S = sig
                    ^^^
  This 'sig' might be unmatched

@gasche
Copy link
Member

gasche commented Dec 15, 2020

Indeed! I get pretty good output with line directives and the -noprompt option.

#0 "first example"
1 + ;;
#0 "second example"
(2 ,);;

Observed output

$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

File "first example", line 1, characters 4-6:
Error: Syntax error
File "second example", line 1, characters 4-5:
Error: Syntax error: ')' expected
File "second example", line 1, characters 0-1:
  This '(' might be unmatched

@gasche
Copy link
Member

gasche commented Dec 15, 2020

We could even embed the example in the line directive...

#0 "1 +"
1 + ;;
#0 "(2, )"
(2 ,);;
$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

File "1 +", line 1, characters 4-6:
Error: Syntax error
File "(2, )", line 1, characters 4-5:
Error: Syntax error: ')' expected
File "(2, )", line 1, characters 0-1:
  This '(' might be unmatched

Edit: thinking about it, getting rid of the line directives is better.

1 + ;;
(2 ,);;
$ TERM=dumb ocaml -noprompt < test.ml
        OCaml version 4.10.0

Line 1, characters 4-6:
1 | 1 + ;;
        ^^
Error: Syntax error
Line 1, characters 4-5:
1 | (2 ,);;
        ^
Error: Syntax error: ')' expected
Line 1, characters 0-1:
1 | (2 ,);;
    ^
  This '(' might be unmatched

@fpottier
Copy link
Contributor Author

Thanks for your suggestions. Regarding interfaces, wrapping them in module type S = sig ... end seems acceptable. If I remove the entry point interface from the grammar, I get 1075 reachable states instead of 1077, so we apparently do not lose much.

@gasche
Copy link
Member

gasche commented Dec 15, 2020

Now we be curious to hear about the checking time if all tests are in a single file passed to the ocaml toplevel.

@xavierleroy
Copy link
Contributor

Now we be curious to hear about the checking time if all tests are in a single file passed to the ocaml toplevel.

About 0.3 s on my machine.

@fpottier
Copy link
Contributor Author

About 0.3 s on my machine.

This seems to say something about the overhead imposed by ocamltest, but probably I don't know what I am talking about...

@gasche
Copy link
Member

gasche commented Dec 15, 2020

(As everyone pointed out already) just launching 1K ocaml or ocamlc invocations to read 1K files and produce 1K files will be slow, more or less slow depending on the operating system. It's not practical to generate a thousand test files, so if we believe that the tests could be valuable to have in the compiler you need to find another way, such as the ocaml approach just suggested.

@fpottier fpottier changed the title Add 1077 test cases for syntax errors (plus tooling) Add one new test file containing 1086 tests of the syntax error messages (plus tooling) Dec 15, 2020
@fpottier
Copy link
Contributor Author

I have uploaded a new version of the PR. Abusing the REPL is a bit of a hack, but (mostly) works. The performance problems should now be avoided. Let me know what you think...

@xavierleroy
Copy link
Contributor

Technically, the PR looks good and runs fast :-) . I'm not 100% sure I understand what is being tested, however. Which mistakes is this test going to catch, exactly?

@fpottier
Copy link
Contributor Author

As of now, it doesn't catch any mistakes; it just reflects the current state of the syntax error messages that the parser can produce. In the future, when someone changes the parser, this test allows seeing immediately how the error messages are (intentionally or unintentionally) affected. For instance, I have another PR in preparation where I change the error handling strategy, and the existence of this machinery allows me to quickly assess its effect.

@trefis
Copy link
Contributor

trefis commented Dec 16, 2020

FTR: I'm in favor of this kind of tests, if they are not too costly (which they don't seem to be with this new implementation).

@gasche
Copy link
Member

gasche commented Dec 17, 2020

Our integration tests complains about procedural things that should be fixed:

  • a Changes entry is missing
  • check-typo detects that many lines are too long (bad!)
Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: Makefile.menhir
./Makefile.menhir:203.81: [long-line] line is over 80 columns
./Makefile.menhir:277.81: [long-line] line is over 80 columns
./Makefile.menhir:292.81: [long-line] line is over 80 columns
./Makefile.menhir:293.81: [long-line] line is over 80 columns
./Makefile.menhir:294.81: [long-line] line is over 80 columns
./Makefile.menhir:294.133: [very-long-line] line is over 132 columns
./Makefile.menhir:295.81: [long-line] line is over 80 columns
./Makefile.menhir:295.133: [very-long-line] line is over 132 columns

Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: parsing/parser.mly
./parsing/parser.mly:722.81: [long-line] line is over 80 columns
./parsing/parser.mly:746.81: [long-line] line is over 80 columns

Checking 339fd1e4e17c63c118daba16a4b120042fd242d1: testsuite/tests/generated-parse-errors/errors.ml
./testsuite/tests/generated-parse-errors/errors.ml:916.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:919.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:988.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:1087.133: [very-long-line] line is over 132 columns
./testsuite/tests/generated-parse-errors/errors.ml:1690.133: [very-long-line] line is over 132 columns

The long lines in Makefile.menhir and parser.mly should be broken. For the generated errors, I suppose that the errors come from generated text (and not human inputs; there is no human input in the file for now), and that you may not have a line-wrapping approach for those. It is possible to opt-out of long-line checking for a specific file, see the line about testsuite/tests/asmgen/immediates.cmm in the .gitattributes file at the root of the repository.

@gasche
Copy link
Member

gasche commented Dec 17, 2020

I would be in favor of merging the PR, once the procedural issues are fixed. It is not very exciting yet, but there are no downsides either (thanks to the various changes that went in since submission), and if it can motivate @fpottier to keep working on error messages it can have a large impact.

The former is a subset of the latter, and suffices when running Menhir
to perform an analysis of the grammar.

This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc
have not been built yet.
…e-aliases.

The flag --require-aliases makes sure that the property that every token
has an alias will be preserved in the future.

This requires Menhir 20201214.
This rule runs Menhir's reachability analysis, which produces a list of all
states where a syntax error can be detected (and a corresponding list of of
erroneous sentences). This data is stored in parsing/parser.auto.messages.

All text between BEGIN AVOID and END AVOID is removed from the grammar before
the analysis is run. This can be used to filter out productions and
declarations that the analysis should ignore.
This rule turns the error sentences stored in parsing/parser.auto.messages
into concrete .ml files, which can be used as tests. One file per sentence is
created. The file name is derived from the sentence. The test files are placed
in the directory testsuite/tests/generated-parse-errors.
This prevents [make list-errors] from generating sentences that exploit
this production. Indeed, there is a risk of generating sentences that
cause syntax errors, due to the auxiliary function [addlb], which rejects
certain puns.
…s.ml.

This file was produced by [make generate-parse-errors].

This file contains:
    1072 sentences whose start symbol is implementation.
       5 sentences whose start symbol is use_file.
       9 sentences whose start symbol is toplevel_phrase.

The parser's output can be described as follows:
    1086 syntax errors reported.
     721 syntax errors without explanation.
     365 syntax errors with an indication of what was expected.
     307 syntax errors with an indication of an unmatched delimiter.
@fpottier
Copy link
Contributor Author

Thanks Gabriel, I have fixed the issues that you raised.

@gasche
Copy link
Member

gasche commented Dec 17, 2020

The check-typo test is still failing due to very long lines in the test file, and your rebased commits do not include any .gitattributes change, is this intended?

@fpottier
Copy link
Contributor Author

Sorry, should be fixed now.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is happy, and there's agreement that this new test is useful, so let's merge.

@xavierleroy
Copy link
Contributor

For the record: Menhir versions 20201214 and 20201216 have landed on OPAM.

@xavierleroy xavierleroy merged commit b71521b into ocaml:trunk Dec 21, 2020
@fpottier fpottier deleted the parse-errors-generated branch December 21, 2020 20:37
dbuenzli pushed a commit to dbuenzli/ocaml that referenced this pull request Mar 25, 2021
…ges (plus tooling) (ocaml#10086)

* Improve [make clean-menhir] to remove parser.{automaton,conflicts}.

* Distinguish MENHIRBASICFLAGS and MENHIRFLAGS.

The former is a subset of the latter, and suffices when running Menhir
to perform an analysis of the grammar.

This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc
have not been built yet.

* Define an alias (i.e., concrete syntax) for every token. Add --require-aliases.

The flag --require-aliases makes sure that the property that every token
has an alias will be preserved in the future.

This requires Menhir 20201214.

* Add [make list-parse-errors].

This rule runs Menhir's reachability analysis, which produces a list of all
states where a syntax error can be detected (and a corresponding list of of
erroneous sentences). This data is stored in parsing/parser.auto.messages.

All text between BEGIN AVOID and END AVOID is removed from the grammar before
the analysis is run. This can be used to filter out productions and
declarations that the analysis should ignore.

* Add [make generate-parse-errors].

This rule turns the error sentences stored in parsing/parser.auto.messages
into concrete .ml files, which can be used as tests. One file per sentence is
created. The file name is derived from the sentence. The test files are placed
in the directory testsuite/tests/generated-parse-errors.

* Mark the three productions that use [not_expecting] with [AVOID].

* Mark the production that allows puns with [AVOID].

This prevents [make list-parse-errors] from generating sentences that exploit
this production. Indeed, there is a risk of generating sentences that
cause syntax errors, due to the auxiliary function [addlb], which rejects
certain puns.

* Mark some of the start symbols with [AVOID].

* Add one new test file in testsuite/tests/generated-parse-errors/errors.ml.

This file was produced by [make generate-parse-errors].

This file contains:
    1072 sentences whose start symbol is implementation.
       5 sentences whose start symbol is use_file.
       9 sentences whose start symbol is toplevel_phrase.

The parser's output can be described as follows:
    1086 syntax errors reported.
     721 syntax errors without explanation.
     365 syntax errors with an indication of what was expected.
     307 syntax errors with an indication of an unmatched delimiter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants