Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to name existentials in pattern-matching #9584

Merged
merged 24 commits into from Feb 4, 2021

Conversation

garrigue
Copy link
Contributor

This attempts to implement a solution suggested by @yallop in #7074 to the problem of naming existentials in pattern-matching.
For the sake of simplicity(?) the syntax is:

type _ ty = Int : int ty
type dyn = Dyn : 'a ty * 'a -> dyn
let f = function Dyn (type a) (w, x : a ty * a) -> ignore (x : a)

I.e. the type is given as a tuple rather than on individual arguments.
Ellipses are allowed, but all existentials must be matched.

@gasche
Copy link
Member

gasche commented Sep 1, 2020

@trefis pinged me about this PR just now. I like the proposed syntax, much better than the one in #9579. There is broad consensus in #7074 that this syntax is acceptable (@yallop, @lpw25 and myself supported this syntax, @craigfe had solid arguments against the other proposal, etc.).

I remain of the opinion that it should ideally be possible to have something "more local" in the annotating-the-argument parts, for example C (type a) ((x : a), f) should be accepted just as well as C (type a b) (x, f : a * (a -> b)), but this could come later as a relaxation of the current feature. Even in that relaxed world, the syntax you just proposed would be accepted and make sense.

(I'm only semi-happy with the * which entertains the confusion between tuple types and multi-argument constructors, but it is consistent with the GADT declaration syntax so probably fine.)

@gasche
Copy link
Member

gasche commented Sep 1, 2020

I think it would be nice to move forward: we have a syntax we like, now we need to check that the implementation is reasonably consensual (maybe @trefis or @lpw25 could chime in with their perspective here), and seriously consider this for inclusion.

Copy link
Contributor

@lpw25 lpw25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few small things to improve, but the code looks correct to me. Well done everyone involved for finding something we can all agree to.

parsing/ast_iterator.ml Outdated Show resolved Hide resolved
parsing/ast_mapper.ml Outdated Show resolved Hide resolved
parsing/printast.ml Outdated Show resolved Hide resolved
typing/ctype.ml Outdated Show resolved Hide resolved
typing/typecore.ml Show resolved Hide resolved
typing/untypeast.ml Outdated Show resolved Hide resolved
parsing/parser.mly Outdated Show resolved Hide resolved
parsing/parsetree.mli Outdated Show resolved Hide resolved
@garrigue
Copy link
Contributor Author

After many weeks and two rebases, I think this PR is now ready for inclusion.
I also added an entry for the manual.
@lpw25 already gave his approval, but since there are a number of changes, it would be good if at least one of you could give a fresh look.

@@ -89,7 +89,7 @@ and 'k pattern_desc =
*)
| Tpat_construct :
Longident.t loc * Types.constructor_description *
value general_pattern list ->
value general_pattern list * (Ident.t loc list * core_type) option ->
value pattern_desc
(** C []
C P [P]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here should be updated, just like in parsetree.mli (thanks for that change, by the way).

Copy link
Contributor

@lpw25 lpw25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted a couple of small things to change and maybe a small bug. Good to go once they are addressed.

typing/tast_iterator.ml Outdated Show resolved Hide resolved
typing/typecore.ml Outdated Show resolved Hide resolved
typing/printpat.ml Show resolved Hide resolved
typing/printpat.ml Outdated Show resolved Hide resolved
typing/printpat.ml Outdated Show resolved Hide resolved
typing/printpat.ml Outdated Show resolved Hide resolved
typing/printpat.ml Outdated Show resolved Hide resolved
@gasche
Copy link
Member

gasche commented Jan 26, 2021

Note: I think that the feature is very useful/important and that the design we ended up with is good. Thanks a lot @garrigue for this work.

(* C None
C P Some (P, None)
C (P1, ..., Pn) Some (Ppat_tuple [P1; ...; Pn], None)
C (type a b) (P : T) Some (P, Some ([a; b], T))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm late in the discussion, but shouldn't we encode a regular Ppat_constraint to keep the core_type, i.e. just use:

  Ppat_construct of loc * (pattern * string loc list) option

(with an empty list when there is no (type ...)) Requiring to have the inner pattern be a Ppat_constraint when the list is not empty is ok, imo.

I'm concerned that the current representation changes the way an existing constraint would be represented when one adds a type binding (C (P : T) --> C (type a) (P : T')).

Can the type-checked behave differently on C (type a) (P : T) and C (type a) ((P : T) : T)?

Btw, is it an error to bind type names which are not used in the pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about suggesting such nesting, but:

  • Currently the annotation is interpreted in the context of the type variables in a very specific way, so it makes sense to keep the two pieces together at the constructor site.
  • Even with the representation you propose, my understanding is that the type-checker would have to lookup the Tpat_constraint node and use its payload specifically (not go through the usual Tpat_constraint path in this case), so you would not get the same-behavior guarantees that you are asking about.

The handling of existential variables is very specific because, morally, their actual binding is computed by unifying the annotation with the constructor return type, and then making them rigid for the rest of the type-checking. This is not the behavior you would get if you checked the whole subpattern before rigidifying (I think that behavior would make sense and we discussed it earlier, but @garrigue says it is harder to implement), nor of course the behavior you would get by rigidifying before checking the annotation or using flexible variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the previous discussion I was referring to is starting (approximatively) at #7074 (comment) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gasche for the pointer.

  • If the type-checking is not an extension of what happens for C (P : T), reusing the same concrete syntax can really be misleading, no?

  • Independently of what the type checker does, having different representation for the T in C (type a) (P : T) and in C (P : T) is confusing for people writing syntactic tools. I'm not shocked that, short of having a cleaner solution, the type-checker does something special with Ppat_constraint on top of the pattern when existential type names are bound. We might be able to change the behavior of the type-checker later, while keeping the same syntax and Parsetree representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type-checking is not an extension of what happens for C (P : T), reusing the same concrete syntax can really be misleading, no?

I think that the two readings of C (P : T) (as a constraint, and as a return-type-annotated-constructor pattern with no existential variables bound) do have the same meaning. What I meant is that if we think of the type-checking of a pattern P as generating an inference constraint (or some other "result", for example in an non-inferred world taking an scrutinee type and returning types for the bound variables), then the result of type-checking of C (type a...) (P : T) cannot be computed in a modular lay, as an action on the result of the type-checking of (P : T).

Independently of what the type checker does, having different representation for the T in C (type a) (P : T) and in C (P : T) is confusing for people writing syntactic tools.

This is a good point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(When I say "merge this one", I'm not actually sure that all pending questions have been resolved, this is for @lpw25 to confirm. So it may not be the unique blocker yet.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either the current version or the version where the check is completely moved to the type-checker and the parser is happy to parse patterns with existential binders and no type constraints. I dislike any version where what the type can represent and what the parser can produce are not the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realize that it is wrong to say that

C (type a) (pats : ty)

is equivalent to

C (pats : ty)

when a occurs nowhere: if C has arity 2 or more, pats must syntactically be a list of patterns, and the latter would not be typable. So, sorry, this is really a new syntax.
@gasche indeed suggested that it would be nicer to allow type constraints on each argument, but I answered that it would be difficult to implement it cleanly.
I will still give it a try, just to see how it would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC, for a pattern with 2 arguments:

  • C _ is allowed but C (type a) _ is rejected(?).
  • C (type a) ((x, y) : t) can be allowed (for a correct type) (?) but C ((x, y) : t) is always rejected.

Is that right? Are there other differences?

The situation with n-ary arguments was already quite confusing, I fear this is getting even worse...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite correct.

  • C _ is allowed, but C (type a) _ is rejected by the type checker (not the parser) independently of the number of arguments. The (type a) notation inside patterns is restricted to existential variables, and requires to bind all of them, which you cannot do without a type annotation, hence the rejection.
  • Annotations are always allowed for multiary types (the internal parenthesis are optional).
type ('a, 'b) pair = Pair of 'a * 'b;;
type t = int * bool;;
let f = function Pair ((x,y) : t) -> x;;
val f : (int, bool) pair -> int = <fun>
type closure = Clos : ('a -> int) * 'a -> closure;;
type 'a app = ('a -> int) * 'a;;
let f = function Clos (type a) ((x,y) : a app) -> x y;;
val f : closure -> int = <fun>

The result looks more regular to me.

@lpw25
Copy link
Contributor

lpw25 commented Jan 26, 2021

You replied "done" to a few remarks, but I don't see the changes. A forgotten push maybe?

@garrigue
Copy link
Contributor Author

You replied "done" to a few remarks, but I don't see the changes. A forgotten push maybe?

No, this should be there. They are marked as outdated, so the code should have changed.
If there is still a problem, please add a new comment.

@garrigue
Copy link
Contributor Author

I have create another PR implementing(?) @alainfrisch 's idea (#10180).
Tell me which you prefer.
I don't like the syntax for type annotations, but the problem is indeed with ocaml syntax.

@garrigue
Copy link
Contributor Author

I have merged most of the suggestions here and in #10180 and rebased.

  • the parsetree represention uses a Ppat_constraint, which is checked in Typecore
  • type annotations are only allowed on the argument seen as a tuple, but this is now also allowed for normal constructors too

@garrigue
Copy link
Contributor Author

I'm getting a compilation error in the flambda backend, but I don't see how this can be related to this PR.
Will probably have to wait until this is fixed and rebase.

@garrigue
Copy link
Contributor Author

Note also that neither of these approaches work for GADT constructors with an inline record.
The problem is not just that we are not allowed to name the record, but that there is no way to determine the order of the existentials it introduces.
A solution would be to allow typing a part of the arguments, delaying the GADT constructors they may contain, then check that the existentials are bound, and only then restart typing those eventual GADT constructors.
But this can probably wait for a future PR...

@garrigue
Copy link
Contributor Author

I'm really confused about what is happening. Apparently the bug comes from this PR, but the error message is:

./boot/ocamlrun ./boot/ocamlc -g -nostdlib -I boot -use-prims runtime/primitives -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48-66 -warn-error A -bin-annot -safe-string -strict-formats -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I asmcomp -I asmcomp/debug -I driver -I toplevel -c middle_end/flambda/simple_value_approx.ml -I middle_end/flambda
Fatal error: exception Invalid_argument("output_value: functional value")
make[2]: *** [middle_end/flambda/simple_value_approx.cmo] Error 2

Very confusing.
I hope it would disappear with a bootstrap, but actually this is even worse: before it occurred only when compiling with ocamlopt, but after bootstrap with ocamlc too.

@garrigue
Copy link
Contributor Author

I could eventually fix the problem by being more conservative, but there is a mysterious interaction at work here.

@garrigue
Copy link
Contributor Author

OK, I found the reason: I had forgotten to reinstate processing of the annotation core_type in Tast_mapper and Tast_iterator. It seems that this is used somewhere, and resulted probably in a malformed typedtree. I still do not understand how this can lead to outputting a functional value, but at least the original cause is now clear.

I'm still keeping the conservative approach for already valid annotations, as removing a Ppat_constraint node could change the behavior of build_as_type for instance.

@lpw25
Copy link
Contributor

lpw25 commented Feb 1, 2021

I still do not understand how this can lead to outputting a functional value, but at least the original cause is now clear.

The typed AST mapper is used for iterating over all the environments in the AST and calling [Env.keep_only_summary] on them before outputing the cmt file.

@garrigue
Copy link
Contributor Author

garrigue commented Feb 1, 2021

Thanks for the explanation, this now makes sense.

Is everybody happy with this last iteration of the PR?
If nobody protests, I'm going to merge it tomorrow, as syntax PRs are a pain to rebase...

@Octachron
Copy link
Member

Octachron commented Feb 2, 2021

This is minor, but am I right to think that there is no support for naming existentially quantified row variable:

type t = X: [> `A | `B ] -> t
let f (X (type x) (v:??)) = ()

?

@garrigue garrigue merged commit 89aae98 into ocaml:trunk Feb 4, 2021
@garrigue
Copy link
Contributor Author

garrigue commented Feb 4, 2021

Thanks to everybody in the discussion, and good annotating!

@johnwhitington
Copy link
Contributor

The directory manual/manual was moved to manual/src by 2aeb55a a few days before your merge, but your merge has recreated manual/refman/exten_camltex.tex in manual/manual, orphaning it. The full manual build now fails on trunk (although make html seems to work, luckily).

cc @Octachron

@Octachron
Copy link
Member

Thanks for the notice! This should be fixed in trunk.

@garrigue , to be safe, on the manual side, was there only a new paragraph in the GADT section?

@garrigue
Copy link
Contributor Author

garrigue commented Feb 9, 2021

Yes, that’s all I added.
I thought I understood the change of the manual, but this PR went through too many rebases.

@Octachron
Copy link
Member

Thanks for the confirmation. Indeed rebase across repertory moves tend to be painful, sorry for the noise!

garrigue added a commit to garrigue/ocaml that referenced this pull request Mar 3, 2021
smuenzel pushed a commit to smuenzel/ocaml that referenced this pull request Mar 30, 2021
EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request May 17, 2021
EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request May 17, 2021
…t-principality-and-gadts

Update a testcase in principality-and-gadts.ml to reflect changes in ocaml#9584
voodoos added a commit to voodoos/ocaml that referenced this pull request Aug 31, 2021
Patterns without named existentials are not correctly constructed. Issue introduced in ocaml#9584
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
…mar13/fix-typing-gadts-test-principality-and-gadts

Update a testcase in principality-and-gadts.ml to reflect changes in ocaml#9584
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants