[matching.ml cleanup] overview PR #9321

trefis · 2020-02-20T16:37:48Z

This is a continuation of the work @gasche and I started last summer (cf. #8766, #8768, #8850, #8851, #8861 and #8869).

We are about to send another wave of small(ish) PRs containing code that we wrote back then, and we thought it would be nice for reviewers to also have the whole diff of our current changes, which this PR contains, to have a idea of where things are going.

The main focus at this point is to introduce dedicated types to represent pattern matches (and patterns themselves) at various points of the pipeline, thus eliminating a good number of assert false.

There were some patches on the file since our last series of patch that didn't respect the formatting. So we're calling ocamlformat again (the same version as before, something < 0.10) so everything rebases cleanly.

…te of pms

…mple patterns Also: these types are now instances of Typedtree.pattern_data.

@gasche

review of can_group factorization by @gasche: > I reviewed the change to `can_group` and believe it is correct. > (I knew it would be nicer as a binary operation!) > > The different treatments of Lazy and Tuple/Record does look a bit odd, > but I believe that it is actually the right thing to write. > > In the Tuple or Record case, the idea is that we can group Any heads > with the Tuple/Record case and just explode all of them (including the > Any cases) according to the tuple/record arity. > > In the Lazy case, the corresponding choice would be to add Any values > to the Lazy group, and force all the elements of the group. This is > not so nice, because intuitively we shouldn't force Lazy values unless > we have to. > > One may expect that in general the argument of the pattern will be > forced anyway, as long as there is at least one Lazy pattern above in > the matrix, so it doesn't really matter whether we include the Any > patterns in the forced group or not. I would argue that (1) even if > that was true, it would be semantically dubious to handle Any that way > (see a more precise criterion below), (2) I am not actually sure that > this is true, what if the first group gets found unreachable by > splits in following columns? > > # type void = | ;; > # match (lazy (print_endline "foo"), (None : void option)) with > | lazy (), Some _ -> . > | _, None -> false;; > - : bool = false > > This gives the following decision tree for whether Any is included in > the group: > > - Can the absence of Any be used to generate nice/efficient tests for > the heads of the group? In that case, don't include Any. > (Not included: all the Constant cases, Array (discriminate on size), > String, etc.) > > - Is Any always exactly equivalent to a more-defined head for values > of this type? In that case, do include Any, otherwise do not. > (Included: Variant, Record) > (Not included: Lazy)

The bug was inserted in 842eb76 in 2002. It breaks specialization of default matrices, so in theory it should not lead to wrong-code production, but because it can create default matrices of distinct arity in the same environment it should be able to observe assertion failures in code hitting the bug. I found the bug while upgrading the code to use better types. The input of specialize_matrix has non_empty rows, but the output may be empty -- and this code becomes ill-typed when input and output have distinct types. Yay typing.

…at_or on the claler side

…ingle matcher_head The `matcher_of_pattern` function is a temporary measure to reduce the invasiveness of the patch, and make it easier to review. (Note for reviewers: in the previous version the Record case had a funny handling of Any, but it is in fact equivalent to just adding omegas as we now do in all cases.) There are two obvious directions for improvement: - Get rid of matcher_of_pattern and pass a head directly to the various make_matching_* functions. - Try to factorize this code with ctx_matcher which, it is now obvious, does essentially the same thing. Another, less immediate area of attack would be to consider a presentation of Pattern_head.t where the Any case can be statically ruled out -- maybe the description could have two levels, one isomorphic to option (Any or not?) and one for non-any heads. matching: [minor] Default_environment.specialize takes a head directly

Note: this commit makes a minor improvement to the location used to force (lazy ..) arguments.

The aim is to also move the Simple/Half_simple/General stuff from matching, but we need to split in those modules the part that are purely structural (they go in Patterns) and the parts that are actually compilation logic (Half_simple.of_clause), those stay in Matching.

"rows" are common abstraction of both pattern analysis and compilation, but clauses carrying a Lambda.t term are compilation-specific. Separating rows will thus enable moving more logic to typing/patterns for use in both settings.

Note: this is due to mk_alpha_env raising Cannot_flatten during splitting/precompilation.

Unfortunately since the function is exposed and used in translcore we need to keep the generic one, and introduce a flatten_simple_pattern.

This makes it clearer where the exception comes from.

trefis · 2020-02-20T16:46:48Z

One thing I should mention: almost all the code here was written last summer (modulo some fixes and cleanup that happened earlier this week when rebasing and reading back over the code), and has been used to compile janestreet codebase¹.
I didn't notice any compilation time regression back then.

¹: This doesn't mean to imply that the code is correct, and that careful review isn't required. Careful review is required. I just built some code, but didn't run the resulting programs (so there could be some miscompilations in there).

gasche · 2020-02-20T16:57:54Z

General nature of the changes

The general approach of the work is to turn as much knowledge/invariants into types as reasonably possible without loss of usability. This has helped us make the code quite a bit simpler in some parts, it removes a lot of assert false, and it is essential in enabling people other than the other in understanding how all those invariants worked together.

The vast majority of the changes in this mothership-PR come from two simple ideas we had during our previous code-comprehension work, applied to as much of typing/parmatch.ml (pattern-matching static analysis) and lambda/matching.ml (pattern-matching compilation) as we could in limited time:

distinguish code logic that depends only on the "head" of patterns (is it a tuple, a record, a variant, etc.)
distinguish code logic that depends on patterns where certain pattern-formers are forbidden, typically "no or-patterns or alias-patterns" (patterns with this property are called "simple patterns").

For (1) we use the type of "pattern heads" that we introduced in #8766. For (2), we tried several different approaches, and we realized that the best one is to use polymorphic variants; there is a representation of patterns with polymorphic variants that is only used internally in the algorithms (not in the parsetree or typedtree), using a "view" pattern to expose just the layer of pattern structure as a polymorphic variant, to reason on toplevel pattern-former.

We moved the definition of pattern "heads" and polymorphic-variant-enabled patterns in a common module typing/patterns.ml which is now used in parmatch.ml and matching.ml (and nowhere else). There is a small interaction between the two notions: only "simple" patterns have a well-defined head, so the API for pattern heads now requires simple patterns. (The previous version of #8766 would simply invalid_argument on or-patterns.)

PR reviewing style

The pattern-matching PRs should really be reviewed commit-by-commit. We wrote the code this way, by iterations of local changes, propagating more and more local information in each iteration. Each change is small and reasonably simple (except those marked "REVIEW REQUIRED" that gave us a bit more trouble and are going to give reviewers a bit more trouble as well).

A downside is that the commits contain the whole history of our refactoring; for example, the first few PRs that we are going to send do not contain a separate patterns.ml file or polymorphic variant, because those came only later in our work.

I tried to "rewrite the history" by rebuilding the PR as the introduction of patterns.ml followed by coherent blocks of changes using only the latest version of our API, but I failed to do so. The volume of changes is too large to be able to do this in a reasonable way, because it is very hard to delimit how to do these changes gradually, and one ends up with monster commits.

gasche · 2020-06-07T06:05:30Z

Note: all the commits that were initially in this PR are now merged, thanks to the remarkable review work of @Octachron. There remain a couple TODO items for tasks that were identified during review.

gasche · 2020-10-21T13:50:29Z

Another update: there are currently two PRs that were born out of this work (but not part of the original submission):

[WIP] pattern-matching: Cannot_flatten, maybe we can? #9650, about to be merged, which is a followup on pattern-matching refactoring: clarify usage of the Cannot_flatten exception #9608 (born out of understanding of Cannot_flatten)
pattern-matching compiler: refactor the toplevel handling of partiality #9651, not reviewed yet, which is a followup of pattern-matching refactoring: first-order representation for failure handlers #9647 (failure handlers refactoring)

Both PRs were born out of a TODO item ( b01cc8b ) I placed during review of #9447, trying to simplify the implementation of do_for_multiple_match.

XVilka · 2020-12-03T04:52:07Z

Since these PRs are merged too, looks like it can be closed.

gasche · 2020-12-09T09:49:52Z

There were two TODO items created during Florian's review. One ( b01cc8b ) is now fully adressed in trunk, and the other ( c73c28d) not yet: it is about trying to track statically which record patterns have been "expanded" or whether they still need expanding, and @Octachron has a PR he should sent shortly to do this.

Octachron · 2022-10-18T14:18:54Z

The PR mentioned by @gasche is #10089 where we are still not sure if the type complexity increase is worth it. However, it seems reasonable to close this overview PR now.

trefis and others added 30 commits February 20, 2020 13:11

matching: formatting noise

488a588

There were some patches on the file since our last series of patch that didn't respect the formatting. So we're calling ocamlformat again (the same version as before, something < 0.10) so everything rebases cleanly.

matching: half_compiled_row: (head, patl, act) => - 12 assert false

7d40232

matching: move things around

d074c52

matching: Add intermediary types to keep track of the compilation sta…

286753e

…te of pms

parmatch: export Pattern_head

9a219d7

matching: start using Pattern_head

d8fdcd4

[REVIEW REQUIRED] matching: ctx_matcher with heads

757bb42

matching: use polymorphic variants to classify general/half_simple/si…

b96d57d

…mple patterns Also: these types are now instances of Typedtree.pattern_data.

[minor] matching: dedup debug code

36961ba

matching: what_is_cases returns a head

9828712

matching: inline split_and_precompile* to clarify compile_*

e4a0755

matching: factorize compile_match and compile_match_nonempty

89f0b1c

matching: minor code factorization of compile_* functions

e42b63b

parmatch: Pattern_head.arity

bdc964e

matching: pass explicit arity to Default_environment.specialize

8f38fa1

[REVIEW REQUIRED] matching: remove the OrPat exception by handling Tp…

8d2e4d4

…at_or on the claler side

matching: matcher_* take Simple.pattern arguments

672fd76

matching: specialize_matrix uses non-empty rows

5e56ada

matching: refine the types in Context.ctx_matcher

3ad05f0

matching: finally, merge matcher and Context.ctx_matcher

ef77908

matching: use heads in the make_*_matching specialization calls

08ec9e4

matching: factorize the make_*_matching functions

c8339bd

Note: this commit makes a minor improvement to the location used to force (lazy ..) arguments.

make depend

c746d23

[minor] matching: rename Non_empty_clause.{map_head => map_first}

090746b

matching: separate types for rows and clauses

90ea84f

"rows" are common abstraction of both pattern analysis and compilation, but clauses carrying a Lambda.t term are compilation-specific. Separating rows will thus enable moving more logic to typing/patterns for use in both settings.

patterns: reuse ('a Typedtree.pattern_) to define Patterns.Head.t

961ab88

gasche and others added 7 commits February 20, 2020 13:11

matching: move {general,simple,half_simple}_view to Patterns

943f799

matching: use pattern views in Parmatch as well

5ca29c5

matching: argo => arg_id

73fe304

do_for_multiple_match: add a test for flattening failure

e558e8d

Note: this is due to mk_alpha_env raising Cannot_flatten during splitting/precompilation.

matching: push simple types to flatten_pattern

4e5341a

Unfortunately since the function is exposed and used in translcore we need to keep the generic one, and introduce a flatten_simple_pattern.

[REVIEW REQUIRED] matching: flatten_simple_pattern does not raise

93464af

matching: try => match with exception

5af8a2e

This makes it clearer where the exception comes from.

gasche mentioned this pull request Feb 20, 2020

[matching.ml cleanup] structured rows #9322

Merged

This was referenced Mar 10, 2020

matching: ctx_matcher with heads #9359

Merged

matching: use polymorphic variants to classify general/half_simple/simple patterns #9361

Merged

gasche mentioned this pull request Apr 4, 2020

matching: simplify can_group #9417

Merged

This was referenced Apr 14, 2020

pattern-matching refactoring: keep going #9447

Merged

pattern-matching refactoring: simplify Default_env.specialize_matrix by avoiding the OrPat exception #9464

Merged

gasche mentioned this pull request Apr 23, 2020

pattern-matching refactoring: merge the two matchers #9493

Merged

gasche added 2 commits May 2, 2020 09:04

TODO item from ocaml#9447

b01cc8b

TODO item from ocaml#9493

c73c28d

gasche mentioned this pull request May 2, 2020

pattern-matching refactoring: refactor the make_<foo>_matching functions #9520

Merged

trefis mentioned this pull request May 6, 2020

Support for integer ranges #8504

Open

gasche mentioned this pull request May 14, 2020

pattern-matching refactoring: move pattern types to a new module typing/Patterns #9563

Merged

This was referenced May 25, 2020

pattern-matching refactoring: refine the type of complete_constrs #9599

Merged

pattern-matching refactoring: clarify usage of the Cannot_flatten exception #9608

Merged

gasche mentioned this pull request Jun 6, 2020

pattern-matching refactoring: first-order representation for failure handlers #9647

Merged

Octachron closed this Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[matching.ml cleanup] overview PR #9321

[matching.ml cleanup] overview PR #9321

trefis commented Feb 20, 2020

trefis commented Feb 20, 2020

gasche commented Feb 20, 2020

gasche commented Jun 7, 2020

gasche commented Oct 21, 2020

XVilka commented Dec 3, 2020

gasche commented Dec 9, 2020

Octachron commented Oct 18, 2022

[matching.ml cleanup] overview PR #9321

[matching.ml cleanup] overview PR #9321

Conversation

trefis commented Feb 20, 2020

trefis commented Feb 20, 2020

gasche commented Feb 20, 2020

General nature of the changes

PR reviewing style

gasche commented Jun 7, 2020

gasche commented Oct 21, 2020

XVilka commented Dec 3, 2020

gasche commented Dec 9, 2020

Octachron commented Oct 18, 2022