Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[matching.ml cleanup] overview PR #9321

Closed
wants to merge 40 commits into from
Closed

Conversation

trefis
Copy link
Contributor

@trefis trefis commented Feb 20, 2020

This is a continuation of the work @gasche and I started last summer (cf. #8766, #8768, #8850, #8851, #8861 and #8869).

We are about to send another wave of small(ish) PRs containing code that we wrote back then, and we thought it would be nice for reviewers to also have the whole diff of our current changes, which this PR contains, to have a idea of where things are going.

The main focus at this point is to introduce dedicated types to represent pattern matches (and patterns themselves) at various points of the pipeline, thus eliminating a good number of assert false.

trefis and others added 30 commits February 20, 2020 13:11
There were some patches on the file since our last series of patch that
didn't respect the formatting.
So we're calling ocamlformat again (the same version as before,
something < 0.10) so everything rebases cleanly.
…mple patterns

Also: these types are now instances of Typedtree.pattern_data.
review of can_group factorization by @gasche:

> I reviewed the change to `can_group` and believe it is correct.
> (I knew it would be nicer as a binary operation!)
>
> The different treatments of Lazy and Tuple/Record does look a bit odd,
> but I believe that it is actually the right thing to write.
>
> In the Tuple or Record case, the idea is that we can group Any heads
> with the Tuple/Record case and just explode all of them (including the
> Any cases) according to the tuple/record arity.
>
> In the Lazy case, the corresponding choice would be to add Any values
> to the Lazy group, and force all the elements of the group. This is
> not so nice, because intuitively we shouldn't force Lazy values unless
> we have to.
>
> One may expect that in general the argument of the pattern will be
> forced anyway, as long as there is at least one Lazy pattern above in
> the matrix, so it doesn't really matter whether we include the Any
> patterns in the forced group or not. I would argue that (1) even if
> that was true, it would be semantically dubious to handle Any that way
> (see a more precise criterion below), (2) I am not actually sure that
> this is true, what if the first group gets found unreachable by
> splits in following columns?
>
>     # type void = | ;;
>     # match (lazy (print_endline "foo"), (None : void option)) with
>       | lazy (), Some _ -> .
>       | _, None -> false;;
>     - : bool = false
>
> This gives the following decision tree for whether Any is included in
> the group:
>
> - Can the absence of Any be used to generate nice/efficient tests for
>   the heads of the group? In that case, don't include Any.
>   (Not included: all the Constant cases, Array (discriminate on size),
>    String, etc.)
>
> - Is Any always exactly equivalent to a more-defined head for values
>   of this type? In that case, do include Any, otherwise do not.
>   (Included: Variant, Record)
>   (Not included: Lazy)
The bug was inserted in 842eb76 in
2002. It breaks specialization of default matrices, so in theory it
should not lead to wrong-code production, but because it can create
default matrices of distinct arity in the same environment it should
be able to observe assertion failures in code hitting the bug.

I found the bug while upgrading the code to use better types. The
input of specialize_matrix has non_empty rows, but the output may be
empty -- and this code becomes ill-typed when input and output have
distinct types. Yay typing.
…ingle matcher_head

The `matcher_of_pattern` function is a temporary measure to reduce the
invasiveness of the patch, and make it easier to review.

(Note for reviewers: in the previous version the Record case had
a funny handling of Any, but it is in fact equivalent to just adding
omegas as we now do in all cases.)

There are two obvious directions for improvement:

- Get rid of matcher_of_pattern and pass a head directly to the
  various make_matching_* functions.

- Try to factorize this code with ctx_matcher which, it is now
  obvious, does essentially the same thing.

Another, less immediate area of attack would be to consider
a presentation of Pattern_head.t where the Any case can be statically
ruled out -- maybe the description could have two levels, one
isomorphic to option (Any or not?) and one for non-any heads.

matching: [minor] Default_environment.specialize takes a head directly
Note: this commit makes a minor improvement to the location used to
force (lazy ..) arguments.
The aim is to also move the Simple/Half_simple/General stuff from
matching, but we need to split in those modules the part that are
purely structural (they go in Patterns) and the parts that are
actually compilation logic (Half_simple.of_clause), those stay in
Matching.
"rows" are common abstraction of both pattern analysis and
compilation, but clauses carrying a Lambda.t term are
compilation-specific. Separating rows will thus enable moving more
logic to typing/patterns for use in both settings.
gasche and others added 7 commits February 20, 2020 13:11
Note: this is due to mk_alpha_env raising Cannot_flatten during
splitting/precompilation.
Unfortunately since the function is exposed and used in translcore we
need to keep the generic one, and introduce a flatten_simple_pattern.
This makes it clearer where the exception comes from.
@trefis
Copy link
Contributor Author

trefis commented Feb 20, 2020

One thing I should mention: almost all the code here was written last summer (modulo some fixes and cleanup that happened earlier this week when rebasing and reading back over the code), and has been used to compile janestreet codebase¹.
I didn't notice any compilation time regression back then.


¹: This doesn't mean to imply that the code is correct, and that careful review isn't required. Careful review is required. I just built some code, but didn't run the resulting programs (so there could be some miscompilations in there).

@gasche
Copy link
Member

gasche commented Feb 20, 2020

General nature of the changes

The general approach of the work is to turn as much knowledge/invariants into types as reasonably possible without loss of usability. This has helped us make the code quite a bit simpler in some parts, it removes a lot of assert false, and it is essential in enabling people other than the other in understanding how all those invariants worked together.

The vast majority of the changes in this mothership-PR come from two simple ideas we had during our previous code-comprehension work, applied to as much of typing/parmatch.ml (pattern-matching static analysis) and lambda/matching.ml (pattern-matching compilation) as we could in limited time:

  1. distinguish code logic that depends only on the "head" of patterns (is it a tuple, a record, a variant, etc.)
  2. distinguish code logic that depends on patterns where certain pattern-formers are forbidden, typically "no or-patterns or alias-patterns" (patterns with this property are called "simple patterns").

For (1) we use the type of "pattern heads" that we introduced in #8766. For (2), we tried several different approaches, and we realized that the best one is to use polymorphic variants; there is a representation of patterns with polymorphic variants that is only used internally in the algorithms (not in the parsetree or typedtree), using a "view" pattern to expose just the layer of pattern structure as a polymorphic variant, to reason on toplevel pattern-former.

We moved the definition of pattern "heads" and polymorphic-variant-enabled patterns in a common module typing/patterns.ml which is now used in parmatch.ml and matching.ml (and nowhere else). There is a small interaction between the two notions: only "simple" patterns have a well-defined head, so the API for pattern heads now requires simple patterns. (The previous version of #8766 would simply invalid_argument on or-patterns.)

PR reviewing style

The pattern-matching PRs should really be reviewed commit-by-commit. We wrote the code this way, by iterations of local changes, propagating more and more local information in each iteration. Each change is small and reasonably simple (except those marked "REVIEW REQUIRED" that gave us a bit more trouble and are going to give reviewers a bit more trouble as well).

A downside is that the commits contain the whole history of our refactoring; for example, the first few PRs that we are going to send do not contain a separate patterns.ml file or polymorphic variant, because those came only later in our work.

I tried to "rewrite the history" by rebuilding the PR as the introduction of patterns.ml followed by coherent blocks of changes using only the latest version of our API, but I failed to do so. The volume of changes is too large to be able to do this in a reasonable way, because it is very hard to delimit how to do these changes gradually, and one ends up with monster commits.

@gasche
Copy link
Member

gasche commented Jun 7, 2020

Note: all the commits that were initially in this PR are now merged, thanks to the remarkable review work of @Octachron. There remain a couple TODO items for tasks that were identified during review.

@gasche
Copy link
Member

gasche commented Oct 21, 2020

Another update: there are currently two PRs that were born out of this work (but not part of the original submission):

Both PRs were born out of a TODO item ( b01cc8b ) I placed during review of #9447, trying to simplify the implementation of do_for_multiple_match.

@XVilka
Copy link
Contributor

XVilka commented Dec 3, 2020

Since these PRs are merged too, looks like it can be closed.

@gasche
Copy link
Member

gasche commented Dec 9, 2020

There were two TODO items created during Florian's review. One ( b01cc8b ) is now fully adressed in trunk, and the other ( c73c28d) not yet: it is about trying to track statically which record patterns have been "expanded" or whether they still need expanding, and @Octachron has a PR he should sent shortly to do this.

@Octachron
Copy link
Member

The PR mentioned by @gasche is #10089 where we are still not sure if the type complexity increase is worth it. However, it seems reasonable to close this overview PR now.

@Octachron Octachron closed this Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants