-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[matching.ml cleanup] overview PR #9321
Conversation
There were some patches on the file since our last series of patch that didn't respect the formatting. So we're calling ocamlformat again (the same version as before, something < 0.10) so everything rebases cleanly.
…mple patterns Also: these types are now instances of Typedtree.pattern_data.
review of can_group factorization by @gasche: > I reviewed the change to `can_group` and believe it is correct. > (I knew it would be nicer as a binary operation!) > > The different treatments of Lazy and Tuple/Record does look a bit odd, > but I believe that it is actually the right thing to write. > > In the Tuple or Record case, the idea is that we can group Any heads > with the Tuple/Record case and just explode all of them (including the > Any cases) according to the tuple/record arity. > > In the Lazy case, the corresponding choice would be to add Any values > to the Lazy group, and force all the elements of the group. This is > not so nice, because intuitively we shouldn't force Lazy values unless > we have to. > > One may expect that in general the argument of the pattern will be > forced anyway, as long as there is at least one Lazy pattern above in > the matrix, so it doesn't really matter whether we include the Any > patterns in the forced group or not. I would argue that (1) even if > that was true, it would be semantically dubious to handle Any that way > (see a more precise criterion below), (2) I am not actually sure that > this is true, what if the first group gets found unreachable by > splits in following columns? > > # type void = | ;; > # match (lazy (print_endline "foo"), (None : void option)) with > | lazy (), Some _ -> . > | _, None -> false;; > - : bool = false > > This gives the following decision tree for whether Any is included in > the group: > > - Can the absence of Any be used to generate nice/efficient tests for > the heads of the group? In that case, don't include Any. > (Not included: all the Constant cases, Array (discriminate on size), > String, etc.) > > - Is Any always exactly equivalent to a more-defined head for values > of this type? In that case, do include Any, otherwise do not. > (Included: Variant, Record) > (Not included: Lazy)
The bug was inserted in 842eb76 in 2002. It breaks specialization of default matrices, so in theory it should not lead to wrong-code production, but because it can create default matrices of distinct arity in the same environment it should be able to observe assertion failures in code hitting the bug. I found the bug while upgrading the code to use better types. The input of specialize_matrix has non_empty rows, but the output may be empty -- and this code becomes ill-typed when input and output have distinct types. Yay typing.
…at_or on the claler side
…ingle matcher_head The `matcher_of_pattern` function is a temporary measure to reduce the invasiveness of the patch, and make it easier to review. (Note for reviewers: in the previous version the Record case had a funny handling of Any, but it is in fact equivalent to just adding omegas as we now do in all cases.) There are two obvious directions for improvement: - Get rid of matcher_of_pattern and pass a head directly to the various make_matching_* functions. - Try to factorize this code with ctx_matcher which, it is now obvious, does essentially the same thing. Another, less immediate area of attack would be to consider a presentation of Pattern_head.t where the Any case can be statically ruled out -- maybe the description could have two levels, one isomorphic to option (Any or not?) and one for non-any heads. matching: [minor] Default_environment.specialize takes a head directly
Note: this commit makes a minor improvement to the location used to force (lazy ..) arguments.
The aim is to also move the Simple/Half_simple/General stuff from matching, but we need to split in those modules the part that are purely structural (they go in Patterns) and the parts that are actually compilation logic (Half_simple.of_clause), those stay in Matching.
"rows" are common abstraction of both pattern analysis and compilation, but clauses carrying a Lambda.t term are compilation-specific. Separating rows will thus enable moving more logic to typing/patterns for use in both settings.
Note: this is due to mk_alpha_env raising Cannot_flatten during splitting/precompilation.
Unfortunately since the function is exposed and used in translcore we need to keep the generic one, and introduce a flatten_simple_pattern.
This makes it clearer where the exception comes from.
One thing I should mention: almost all the code here was written last summer (modulo some fixes and cleanup that happened earlier this week when rebasing and reading back over the code), and has been used to compile janestreet codebase¹. ¹: This doesn't mean to imply that the code is correct, and that careful review isn't required. Careful review is required. I just built some code, but didn't run the resulting programs (so there could be some miscompilations in there). |
General nature of the changesThe general approach of the work is to turn as much knowledge/invariants into types as reasonably possible without loss of usability. This has helped us make the code quite a bit simpler in some parts, it removes a lot of The vast majority of the changes in this mothership-PR come from two simple ideas we had during our previous code-comprehension work, applied to as much of typing/parmatch.ml (pattern-matching static analysis) and lambda/matching.ml (pattern-matching compilation) as we could in limited time:
For (1) we use the type of "pattern heads" that we introduced in #8766. For (2), we tried several different approaches, and we realized that the best one is to use polymorphic variants; there is a representation of patterns with polymorphic variants that is only used internally in the algorithms (not in the parsetree or typedtree), using a "view" pattern to expose just the layer of pattern structure as a polymorphic variant, to reason on toplevel pattern-former. We moved the definition of pattern "heads" and polymorphic-variant-enabled patterns in a common module PR reviewing styleThe pattern-matching PRs should really be reviewed commit-by-commit. We wrote the code this way, by iterations of local changes, propagating more and more local information in each iteration. Each change is small and reasonably simple (except those marked "REVIEW REQUIRED" that gave us a bit more trouble and are going to give reviewers a bit more trouble as well). A downside is that the commits contain the whole history of our refactoring; for example, the first few PRs that we are going to send do not contain a separate patterns.ml file or polymorphic variant, because those came only later in our work. I tried to "rewrite the history" by rebuilding the PR as the introduction of patterns.ml followed by coherent blocks of changes using only the latest version of our API, but I failed to do so. The volume of changes is too large to be able to do this in a reasonable way, because it is very hard to delimit how to do these changes gradually, and one ends up with monster commits. |
Note: all the commits that were initially in this PR are now merged, thanks to the remarkable review work of @Octachron. There remain a couple TODO items for tasks that were identified during review. |
Another update: there are currently two PRs that were born out of this work (but not part of the original submission):
Both PRs were born out of a TODO item ( b01cc8b ) I placed during review of #9447, trying to simplify the implementation of |
Since these PRs are merged too, looks like it can be closed. |
There were two TODO items created during Florian's review. One ( b01cc8b ) is now fully adressed in trunk, and the other ( c73c28d) not yet: it is about trying to track statically which record patterns have been "expanded" or whether they still need expanding, and @Octachron has a PR he should sent shortly to do this. |
This is a continuation of the work @gasche and I started last summer (cf. #8766, #8768, #8850, #8851, #8861 and #8869).
We are about to send another wave of small(ish) PRs containing code that we wrote back then, and we thought it would be nice for reviewers to also have the whole diff of our current changes, which this PR contains, to have a idea of where things are going.
The main focus at this point is to introduce dedicated types to represent pattern matches (and patterns themselves) at various points of the pipeline, thus eliminating a good number of
assert false
.