Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name existentials : new approach #10180

Closed
wants to merge 17 commits into from
Closed

Conversation

garrigue
Copy link
Contributor

@garrigue garrigue commented Jan 29, 2021

#9584 proposed a new syntax to name existentials in pattern-matching:

C (type a1 ... am) (pat1, ..., patn : typ1 * ... * typn)

This is a variation which follows the original syntax for type annotations, as suggested by @alainfrisch :

C (type a1 ... am) ((pat1 : typ1), ..., (patn : typn))

Note that one does not have to annotate all arguments, just enough of them to bind all the ai's.
For instance.

C (type a) (x, (y : a option), z) -> ...

See name_existentials.ml for concrete examples.

A drawback of the current code is that it translates type annotations twice: once to bind the existentials, and once more to constrain the patterns (keeping the code simple).

@garrigue
Copy link
Contributor Author

Another drawback is that where one can write the annotations may be confusing: the type checker will only look on the outside of each argument.

@alainfrisch
Copy link
Contributor

This is a variation which follows the original syntax for type annotations, as suggested by @alainfrisch :

Note that I did not suggest to change the concrete syntax, only the representation in the Parsetree.

I think it would be confusing to only accept C (type a) ((p1 : t1), (p2: t2)) if the constructor is defined with 2 arguments, and to only accept C (type a) ((p1, p2) : t1 * t2) if the constructor is defined with a single tuple argument.

C (P1, ..., Pn) Some (Ppat_tuple [P1; ...; Pn])
| Ppat_construct of
Longident.t loc * (pattern * string loc list) option
(* C None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick : I would find more natural to follow the same ordering as the concrete syntax, i.e. string loc list * pattern.

@gasche
Copy link
Member

gasche commented Jan 29, 2021

Both syntaxes have advantages and inconvenients:

  1. The global syntax (P1, P2 : T1 * T2) corresponds to the GADT declaration syntax, and it highlights the toplevel-only limitation of the (current) implementation of the feature. On the negative side, it augments the confusion between one-tuple-parameter and multi-parameter constructors (just as the GADT declaration syntax).
  2. The per-argument syntax ((P1 : T1), (P2 : T2)) makes the toplevel-only nature less clear, but it is also more flexible (easier to annotate only some arguments without using _) and is much closer to the non-GADT syntax for annotations in patterns.

I have a preference for (2). (I think that the change of representation suggested by @alainfrisch is also probably a good idea.)

Regarding the toplevel-only nature: what happens if we try to use the feature and we forget the toplevel-only nature, we try to annotate the existential variable in depth in the pattern? I suspect that the restrictions that @garrigue put (on the usage of the bound variables at the toplevel) will result in an error in this case (instead of an accepted program with a confusing behavior for the user which is not aware of the topleve restriction), which is good news. I suppose this is clear in some of the examples?

@garrigue
Copy link
Contributor Author

I think it would be confusing to only accept C (type a) ((p1 : t1), (p2: t2)) if the constructor is defined with 2 arguments, and to only accept C (type a) ((p1, p2) : t1 * t2) if the constructor is defined with a single tuple argument.

For the second, this is already the case without (type a). I thought you were concerned by changing the behavior.

For the first, it is accepted in both cases without (type a), but accepting it while binding existentials and refining GADTs seem really tricky.

Another solution is to go back to my first solution, but allow the second kind of annotation (yet without the inner parenthesis) even when there is no (type a).
This would be a conservative extension. This said, I didn't consider the revised syntax...

@garrigue
Copy link
Contributor Author

Regarding the toplevel-only nature: what happens if we try to use the feature and we forget the toplevel-only nature, we try to annotate the existential variable in depth in the pattern? I suspect that the restrictions that @garrigue put (on the usage of the bound variables at the toplevel) will result in an error in this case (instead of an accepted program with a confusing behavior for the user which is not aware of the topleve restriction), which is good news. I suppose this is clear in some of the examples?

The only question is whether all the variables are bound or not. This is decided at the toplevel, and if some are unbound, or wrongly instantiated, one gets an error. Once they are bound, you can use them as much as you want in inner annotations.

@gasche
Copy link
Member

gasche commented Jan 29, 2021

From the testsuite:

type u = C : 'a * ('a -> 'b list) -> u

let f = function C (type a) ((x : a), (f : a -> a list)) -> ignore (x : a)
[%%expect{|
Line 1, characters 17-56:
1 | let f = function C (type a) ((x : a), (f : a -> a list)) -> ignore (x : a)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This type does not bind all existentials in the constructor:
         type a. a * (a -> a list)
|}]

I find this error message rather confusing. What is it trying to tell me, and how should I fix my code?

Naively I would expect that:

  1. It is an error if a type constructor is bound in a pattern, but not used in an annotation (at the toplevel)
    (In particular, Dyn (type a) (w, x) as in an earlier example fails.)
  2. On the other hand, with the annotation-decided semantics I don't really see the problem with not binding some of the existential types.

I think that lifting (2) would be even nicer with the new per-argument syntax, where it is very tempting to annotate only the arguments that contain the one existential type variable we care about.

@alainfrisch
Copy link
Contributor

alainfrisch commented Jan 29, 2021

For the second, this is already the case without (type a). I thought you were concerned by changing the behavior.

My point is that currently, users implicitly learn that C ((p1 : t1), (p2 : t2)) is always "better" than C ((p1,p2) : t1 * t2) because it does not depend on how the constructor is declared (one tuple argument, or several arguments); and this does not extend to the new binding case, where there is no such "best" choice.

It looks straightforward to allow writing C (type a) ((p1, p2) : t1 * t2) as a simple extension to this PR (when splitting the pattern argument of an N-ary constructor, support the case where the pattern has the form ((p1, ..., pN) : t1 * ... * tN; a slightly less syntactic change would allow such constraint, but with a type which expands to such a tuple). This would work also without a type binder, of course, restoring the fact there is a "best" choice (which would now be C (type a) ((p1, p2) : t1 * t2)).

Alternatively, or in addition to that, one could also decide to allow C (type a) ((p1 : t1), (p2 : t2)) also for a unary constructor (treating it exactly as C (type a) ((p1, p2) : t1 * t2)). And even C (type a) ((p1 : t1), p2) (--> C (type a) ((p1, p2) : t1 * _)

All that might seem ad hoc, but it's coherent with the current overloaded syntax for N-ary constructors, and it's just light syntactic processing that reduces the confusion. (But you know, YMMW, I'm also the one who advocated allowing to write C x for an n-ary constructor. I.e. my view is that there is no such thing as n-ary constructors, only constructors with a annotation on the declaration to unbox tuple at the root, seen just as a data-layout detail.)

@lpw25
Copy link
Contributor

lpw25 commented Jan 29, 2021

Personally, my preference would be for going with the design from the other PR, but also allowing users to write:

| C (x, y : int * string) -> ...

for n-ary constructors when there is no list of existential constructors.

@garrigue
Copy link
Contributor Author

I also prefer the original approach. This indeed means a change in the way to annotate constructors, but this seems more coherent with the fact annotations require parentheses. As for the way users have "learnt" that it is always better to write annotations inside, I would assume they would be happy to "unlearn" that, in particular the change is introduced with a new construct.

@garrigue
Copy link
Contributor Author

The suggested changes are now implemented in the original PR #9584 .

@garrigue
Copy link
Contributor Author

Close this one as #9584 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants