Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find right end-to-end testing level #679

Open
lsf37 opened this issue Dec 11, 2019 · 2 comments
Open

find right end-to-end testing level #679

lsf37 opened this issue Dec 11, 2019 · 2 comments
Labels
testing Adding tests or test infrastructure.

Comments

@lsf37
Copy link
Member

lsf37 commented Dec 11, 2019

(sorry for the wall of text, somewhat subtle, TLDR at bottom)

@regisd has made a valiant effort porting the old regression test suite into something more modern, and it's time to think about removing the old ones (as proposed in #678).

From the recent experience with refactoring macros for char classes (#654), it looks like we're close to be able to do that, but not quite there yet.

The main observation is that I had the bazel test suite passing quite some time before the old one (for the same tests), and was catching a number of bugs in the old setup that I would have missed in the new one. This was unfortunately mainly based on the thing we wanted to get rid of -- the comparison of string output.

Basically, what happens is that esp the --dump output gives information about the char classes and NFA/DFA structure that is very hard to test for by behaviour of the next stage down (the lexer itself). In the old setup, this information is too fine-grained -- it is very sensitive to any change in JFlex at all, including ones that are just messages only and don't change behaviour. In the new setup, the information is too coarse-grained. For instance, I missed a case in the refactor where just the wrong char classes were generated for some of the operators, but wrong subtly enough that everything works fine for many inputs. It's hard to test these by scanner behaviour, because you'd have to check the entire set of possible input characters (positively and negatively), and you'd have to do it in the right sequence (depending on what the lexer spec does).

It might be possible to check whether the char class partitions in particular are the same as a given "good" partition (esp now that final char classes don't depend on order of occurrence of things in the spec any more). We'd need to be able to get information out of the JFlex run, instead of only out of the scanner/runtime engine. It might even be useful for JFlex to offer an API for that. Or, even though it might be slightly distasteful, we could make an option that only reports this info on output, without the additional chatter around it, which would make it a lot more reliable to test against.

For NFA/DFA structure, it is a bit harder. We don't want to test that for all specs we have in the test set, but we have a few that are specifically designed to inspect manually if something changes. Usually (not always) if that happens there is a bug. The main output relevant for tests is the final minimised DFA, because that is unique (up to renaming of states) for a given set of regular expressions, and changes in that would indicate a reasonably major behaviour change in JFlex. These are pretty much impossible to test by behaviour of the runtime scanner, because you don't know beforehand which inputs will distinguish two DFAs (without looking at their structure). There's always a finite input that can distinguish two DFAs if they are different, but I don't think you can always tell in finite time if two DFAs are equivalent by inputs only (if you've tested to input length 100, length 101 might be the one that finds a problem etc). So we could do something similar as with the char classes and provide either an API or a string output for just that information on the JFlex side.

We don't need to do the DFA for many test cases, so it'd be Ok for this to be more verbose to write, but we should do it for some (basically the ones we do --dump for in the old framework), so that we trigger for bugs in the generation engine itself. It changes rarely, but even an otherwise innocent refactor could do something we hadn't thought of.

For most "normal" test cases, I just look at the number of states in the minimised DFA -- if it is not equal to before, the DFAs are not equal, and if they are equal, usually it's fine (that's why many of the old test don't have --dump, but not -q either). This is probably something we can check easily in the test case, and should, if the old test doesn't use -q.

Ok, so TLDR, we need mechanisms for:

  • comparing generated char classes against given expected ones
  • comparing full generated minimised DFA against given expected ones
  • as a more lenient option, compare number of states in minimised DFA against given expected

I'm unsure about warnings. It'd be good to check that expected warnings are
given, at least by default. Do we also need to check that no more warnings
than expected are produced?

@lsf37 lsf37 added the testing Adding tests or test infrastructure. label Dec 11, 2019
@regisd
Copy link
Member

regisd commented Dec 11, 2019

Thanks for the detailed description.

Yes, I want to get rid of sysout comparison We should test behaviour, not check against modifications.

You have a good point, that it would be useful to verify against the minimized DFA.

I agree that checking that some warnings are produced would be useful. I think we need a refactoring of the logging framework for that, in order to check the warning content rather than it's outputted form

@lsf37
Copy link
Member Author

lsf37 commented Dec 11, 2019

Cool, I think we agree on where we want to go with the tests. A logging framework that we could query for the set of warnings that were delivered sounds like a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Adding tests or test infrastructure.
Projects
None yet
Development

No branches or pull requests

2 participants