New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add one new test file containing 1086 tests of the syntax error messages (plus tooling) #10086
Conversation
Just thinking out loud here, but would it make sense to combine all these tests into a single (or a few) expect tests? I am a bit afraid of the time these tests will take to run under Windows (where process creation is slower). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We have already discussed this in an email thread with @let-def and @trefis.)
Two broad remarks:
- I am worried about the generated tests taking too long to run, how much time does it currently add to the testsuite?
- Have you considered generating all tests in one file, using the expect-test machinery, in the hope that it would reduce the testing time? (There is more IO plumbing, but only one instance of the compiler process is run.) Another option would be to build a tester from
compiler-libs
, to just call the parser.
Currently the test data seems fairly impractical in various ways (testing time, number of generated files etc.). It looks more like something that should remain on the side for now, outside the compiler codebase.
(We have discussed recently having a notion of "slow tests" that would be run an on irregular basis. We don't have anything like that, and "testsuite runtime" is critical for some compiler-development feedback loops.)
4acc56a
to
7da7acb
Compare
Maybe it would, but I am not familiar with this machinery, which is why I did not think about it. Where can I read about it? If the tests were combined into a single Running the 1077 tests currently takes about 40 seconds on my machine. That said, the task is embarrassingly parallel; I am not sure why Regarding the fact that there is a large number of small test files, I do not see why that might be a problem? On the contrary, it could make it easier to see what has changed when the test files are re-generated. |
We have For general Windows use, I'm somewhat more concerned by the 200% increase in the number of files. Is it possible to still have 1077 separate test cases, but use the expect machinery so that there is just the |
As far as I can see, this is ineffective, because the granularity is a single directory, and all 1077 test files reside in the same directory. One should perhaps modify
Is there documentation for the |
For "expect" tests, I don't know where the documentation is, but you can find many examples in the test suite, e.g. https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/typing-deprecated/deprecated.ml . If I understand correctly, you can start with empty Like @dra27 I fear that so many small files and so many small tests to run one after the other will be a big performance issue, especially for Windows. A single "expect" file with 1077 tests should be processed efficiently, however, especially if all tests are syntax errors and therefore don't undergo typing and compilation. |
To belabor the point:
First, ocamltest will be started N=1077 times, then will fork N invocations (or perhaps 2N or 4N) of the OCaml compiler(s), creating N directories and mN intermediate files for some value of m. That's a lot of system activity, and under Windows + Cygwin such activity is expensive. Second, any git activity will create or access N files, and again this is not cheap on some of the platforms we have to use. |
I get your points. That said, as far as I can tell, based on looking at the file One may need to rewrite Also, there is a need to be able to indicate, for each chunk, whether it should be parsed as an implementation or as an interface. Or, one might be better off writing another script, which would be analogous to |
Yes, writing your own tool is what I suggested with
I would find it natural to follow the toplevel convention of terminating phrases by Note:
You can embed interfaces into implementation with |
Another option would be to keep the scripts roughly as they are now but not commit the 2*1077 files to the repository and not make them part of the test suite. The scripts would be run only on demand and would allow someone who is working on the parser to check how the compiler's syntax error messages have evolved between commit A and commit B. |
I played a bit with the test files and the standard
The output looks like this:
Line numbers restart at 1 for each phrase, which is good. It might be possible to get meaningful diffs between this output and a reference file. |
Concerning interfaces, we can wrap them in
|
Indeed! I get pretty good output with line directives and the #0 "first example"
1 + ;;
#0 "second example"
(2 ,);; Observed output
|
We could even embed the example in the line directive... #0 "1 +"
1 + ;;
#0 "(2, )"
(2 ,);;
Edit: thinking about it, getting rid of the line directives is better. 1 + ;;
(2 ,);;
|
Thanks for your suggestions. Regarding interfaces, wrapping them in |
Now we be curious to hear about the checking time if all tests are in a single file passed to the |
About 0.3 s on my machine. |
This seems to say something about the overhead imposed by |
(As everyone pointed out already) just launching 1K |
7da7acb
to
d4dd6e6
Compare
d4dd6e6
to
f45bdcd
Compare
I have uploaded a new version of the PR. Abusing the REPL is a bit of a hack, but (mostly) works. The performance problems should now be avoided. Let me know what you think... |
Technically, the PR looks good and runs fast :-) . I'm not 100% sure I understand what is being tested, however. Which mistakes is this test going to catch, exactly? |
As of now, it doesn't catch any mistakes; it just reflects the current state of the syntax error messages that the parser can produce. In the future, when someone changes the parser, this test allows seeing immediately how the error messages are (intentionally or unintentionally) affected. For instance, I have another PR in preparation where I change the error handling strategy, and the existence of this machinery allows me to quickly assess its effect. |
FTR: I'm in favor of this kind of tests, if they are not too costly (which they don't seem to be with this new implementation). |
Our integration tests complains about procedural things that should be fixed:
The long lines in |
I would be in favor of merging the PR, once the procedural issues are fixed. It is not very exciting yet, but there are no downsides either (thanks to the various changes that went in since submission), and if it can motivate @fpottier to keep working on error messages it can have a large impact. |
The former is a subset of the latter, and suffices when running Menhir to perform an analysis of the grammar. This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc have not been built yet.
…e-aliases. The flag --require-aliases makes sure that the property that every token has an alias will be preserved in the future. This requires Menhir 20201214.
This rule runs Menhir's reachability analysis, which produces a list of all states where a syntax error can be detected (and a corresponding list of of erroneous sentences). This data is stored in parsing/parser.auto.messages. All text between BEGIN AVOID and END AVOID is removed from the grammar before the analysis is run. This can be used to filter out productions and declarations that the analysis should ignore.
This rule turns the error sentences stored in parsing/parser.auto.messages into concrete .ml files, which can be used as tests. One file per sentence is created. The file name is derived from the sentence. The test files are placed in the directory testsuite/tests/generated-parse-errors.
This prevents [make list-errors] from generating sentences that exploit this production. Indeed, there is a risk of generating sentences that cause syntax errors, due to the auxiliary function [addlb], which rejects certain puns.
…s.ml. This file was produced by [make generate-parse-errors]. This file contains: 1072 sentences whose start symbol is implementation. 5 sentences whose start symbol is use_file. 9 sentences whose start symbol is toplevel_phrase. The parser's output can be described as follows: 1086 syntax errors reported. 721 syntax errors without explanation. 365 syntax errors with an indication of what was expected. 307 syntax errors with an indication of an unmatched delimiter.
b0a4993
to
f016aff
Compare
Thanks Gabriel, I have fixed the issues that you raised. |
The |
Sorry, should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is happy, and there's agreement that this new test is useful, so let's merge.
For the record: Menhir versions 20201214 and 20201216 have landed on OPAM. |
…ges (plus tooling) (ocaml#10086) * Improve [make clean-menhir] to remove parser.{automaton,conflicts}. * Distinguish MENHIRBASICFLAGS and MENHIRFLAGS. The former is a subset of the latter, and suffices when running Menhir to perform an analysis of the grammar. This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc have not been built yet. * Define an alias (i.e., concrete syntax) for every token. Add --require-aliases. The flag --require-aliases makes sure that the property that every token has an alias will be preserved in the future. This requires Menhir 20201214. * Add [make list-parse-errors]. This rule runs Menhir's reachability analysis, which produces a list of all states where a syntax error can be detected (and a corresponding list of of erroneous sentences). This data is stored in parsing/parser.auto.messages. All text between BEGIN AVOID and END AVOID is removed from the grammar before the analysis is run. This can be used to filter out productions and declarations that the analysis should ignore. * Add [make generate-parse-errors]. This rule turns the error sentences stored in parsing/parser.auto.messages into concrete .ml files, which can be used as tests. One file per sentence is created. The file name is derived from the sentence. The test files are placed in the directory testsuite/tests/generated-parse-errors. * Mark the three productions that use [not_expecting] with [AVOID]. * Mark the production that allows puns with [AVOID]. This prevents [make list-parse-errors] from generating sentences that exploit this production. Indeed, there is a risk of generating sentences that cause syntax errors, due to the auxiliary function [addlb], which rejects certain puns. * Mark some of the start symbols with [AVOID]. * Add one new test file in testsuite/tests/generated-parse-errors/errors.ml. This file was produced by [make generate-parse-errors]. This file contains: 1072 sentences whose start symbol is implementation. 5 sentences whose start symbol is use_file. 9 sentences whose start symbol is toplevel_phrase. The parser's output can be described as follows: 1086 syntax errors reported. 721 syntax errors without explanation. 365 syntax errors with an indication of what was expected. 307 syntax errors with an indication of an unmatched delimiter.
This pull request proposes the following changes:
parser.mly
so that every token carries an alias (an example of its concrete syntax).--require-aliases
to Menhir so that Menhir enforces this property in the future.Makefile.menhir
with new commands,make list-errors
andmake generate-parse-errors
.parser.mly
that the parts of the grammar comprised betweenBEGIN AVOID
andEND AVOID
are ignored bymake generate-parse-errors
. This is required, e.g., to avoid exercising certain productions whose semantic actions cause an error.make generate-parse-errors
to generate one new test file that contains 1086 sentences that cause a syntax error. Executing this new test is very fast. This gives us an overview of the syntax error messages that the parser can produce, and should help us avoid regressions if/when we change the error reporting mechanisms in the future.This requires Menhir 20201214, which is in the proces of being released (waiting for opam).