Name existentials : new approach #10180

garrigue · 2021-01-29T12:50:07Z

#9584 proposed a new syntax to name existentials in pattern-matching:

C (type a1 ... am) (pat1, ..., patn : typ1 * ... * typn)

This is a variation which follows the original syntax for type annotations, as suggested by @alainfrisch :

C (type a1 ... am) ((pat1 : typ1), ..., (patn : typn))

Note that one does not have to annotate all arguments, just enough of them to bind all the ai's.
For instance.

C (type a) (x, (y : a option), z) -> ...

See name_existentials.ml for concrete examples.

A drawback of the current code is that it translates type annotations twice: once to bind the existentials, and once more to constrain the patterns (keeping the code simple).

@lpw25

…edtree as suggested by @lpw25

…type

garrigue · 2021-01-29T12:53:24Z

Another drawback is that where one can write the annotations may be confusing: the type checker will only look on the outside of each argument.

alainfrisch · 2021-01-29T13:25:44Z

This is a variation which follows the original syntax for type annotations, as suggested by @alainfrisch :

Note that I did not suggest to change the concrete syntax, only the representation in the Parsetree.

I think it would be confusing to only accept C (type a) ((p1 : t1), (p2: t2)) if the constructor is defined with 2 arguments, and to only accept C (type a) ((p1, p2) : t1 * t2) if the constructor is defined with a single tuple argument.

alainfrisch · 2021-01-29T13:27:21Z

parsing/parsetree.mli

-           C (P1, ..., Pn)  Some (Ppat_tuple [P1; ...; Pn])
+  | Ppat_construct of
+      Longident.t loc * (pattern * string loc list) option
+        (* C                    None


Nitpick : I would find more natural to follow the same ordering as the concrete syntax, i.e. string loc list * pattern.

gasche · 2021-01-29T13:30:54Z

Both syntaxes have advantages and inconvenients:

The global syntax (P1, P2 : T1 * T2) corresponds to the GADT declaration syntax, and it highlights the toplevel-only limitation of the (current) implementation of the feature. On the negative side, it augments the confusion between one-tuple-parameter and multi-parameter constructors (just as the GADT declaration syntax).
The per-argument syntax ((P1 : T1), (P2 : T2)) makes the toplevel-only nature less clear, but it is also more flexible (easier to annotate only some arguments without using _) and is much closer to the non-GADT syntax for annotations in patterns.

I have a preference for (2). (I think that the change of representation suggested by @alainfrisch is also probably a good idea.)

Regarding the toplevel-only nature: what happens if we try to use the feature and we forget the toplevel-only nature, we try to annotate the existential variable in depth in the pattern? I suspect that the restrictions that @garrigue put (on the usage of the bound variables at the toplevel) will result in an error in this case (instead of an accepted program with a confusing behavior for the user which is not aware of the topleve restriction), which is good news. I suppose this is clear in some of the examples?

garrigue · 2021-01-29T14:08:39Z

I think it would be confusing to only accept C (type a) ((p1 : t1), (p2: t2)) if the constructor is defined with 2 arguments, and to only accept C (type a) ((p1, p2) : t1 * t2) if the constructor is defined with a single tuple argument.

For the second, this is already the case without (type a). I thought you were concerned by changing the behavior.

For the first, it is accepted in both cases without (type a), but accepting it while binding existentials and refining GADTs seem really tricky.

Another solution is to go back to my first solution, but allow the second kind of annotation (yet without the inner parenthesis) even when there is no (type a).
This would be a conservative extension. This said, I didn't consider the revised syntax...

garrigue · 2021-01-29T14:12:55Z

Regarding the toplevel-only nature: what happens if we try to use the feature and we forget the toplevel-only nature, we try to annotate the existential variable in depth in the pattern? I suspect that the restrictions that @garrigue put (on the usage of the bound variables at the toplevel) will result in an error in this case (instead of an accepted program with a confusing behavior for the user which is not aware of the topleve restriction), which is good news. I suppose this is clear in some of the examples?

The only question is whether all the variables are bound or not. This is decided at the toplevel, and if some are unbound, or wrongly instantiated, one gets an error. Once they are bound, you can use them as much as you want in inner annotations.

gasche · 2021-01-29T14:45:08Z

From the testsuite:

type u = C : 'a * ('a -> 'b list) -> u

let f = function C (type a) ((x : a), (f : a -> a list)) -> ignore (x : a)
[%%expect{|
Line 1, characters 17-56:
1 | let f = function C (type a) ((x : a), (f : a -> a list)) -> ignore (x : a)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This type does not bind all existentials in the constructor:
         type a. a * (a -> a list)
|}]

I find this error message rather confusing. What is it trying to tell me, and how should I fix my code?

Naively I would expect that:

It is an error if a type constructor is bound in a pattern, but not used in an annotation (at the toplevel)
(In particular, Dyn (type a) (w, x) as in an earlier example fails.)
On the other hand, with the annotation-decided semantics I don't really see the problem with not binding some of the existential types.

I think that lifting (2) would be even nicer with the new per-argument syntax, where it is very tempting to annotate only the arguments that contain the one existential type variable we care about.

alainfrisch · 2021-01-29T15:46:57Z

For the second, this is already the case without (type a). I thought you were concerned by changing the behavior.

My point is that currently, users implicitly learn that C ((p1 : t1), (p2 : t2)) is always "better" than C ((p1,p2) : t1 * t2) because it does not depend on how the constructor is declared (one tuple argument, or several arguments); and this does not extend to the new binding case, where there is no such "best" choice.

It looks straightforward to allow writing C (type a) ((p1, p2) : t1 * t2) as a simple extension to this PR (when splitting the pattern argument of an N-ary constructor, support the case where the pattern has the form ((p1, ..., pN) : t1 * ... * tN; a slightly less syntactic change would allow such constraint, but with a type which expands to such a tuple). This would work also without a type binder, of course, restoring the fact there is a "best" choice (which would now be C (type a) ((p1, p2) : t1 * t2)).

Alternatively, or in addition to that, one could also decide to allow C (type a) ((p1 : t1), (p2 : t2)) also for a unary constructor (treating it exactly as C (type a) ((p1, p2) : t1 * t2)). And even C (type a) ((p1 : t1), p2) (--> C (type a) ((p1, p2) : t1 * _)

All that might seem ad hoc, but it's coherent with the current overloaded syntax for N-ary constructors, and it's just light syntactic processing that reduces the confusion. (But you know, YMMW, I'm also the one who advocated allowing to write C x for an n-ary constructor. I.e. my view is that there is no such thing as n-ary constructors, only constructors with a annotation on the declaration to unbox tuple at the root, seen just as a data-layout detail.)

lpw25 · 2021-01-29T17:31:42Z

Personally, my preference would be for going with the design from the other PR, but also allowing users to write:

| C (x, y : int * string) -> ...

for n-ary constructors when there is no list of existential constructors.

garrigue · 2021-01-31T02:16:00Z

I also prefer the original approach. This indeed means a change in the way to annotate constructors, but this seems more coherent with the fact annotations require parentheses. As for the way users have "learnt" that it is always better to write annotations inside, I would assume they would be happy to "unlearn" that, in particular the change is introduced with a new construct.

garrigue · 2021-01-31T06:34:57Z

The suggested changes are now implemented in the original PR #9584 .

garrigue · 2021-02-19T01:35:23Z

Close this one as #9584 was merged.

garrigue added 14 commits January 25, 2021 13:25

allow to name existentials in pattern-matching

7287cbe

add examples

f849834

instance_constructor returns existentials

197f789

unify return type first

fcc6319

add Changes log

9e5371d

fix conflicts after rebase

b38da70

change parsetree as suggested by @gscherer and add information to typ…

5f771df

…edtree as suggested by @lpw25

use Env.enter_type rather than Env.add_local_type for new types

19cdb45

update change log

7fe5d64

add entry in manual, just after existential type names in error messages

9f6cf10

update test output for parsing/extensions.ml

8ff4860

fix last comments, and add ~manifest_and_scope argument to new_local_…

323332e

…type

forgotten None

c5ff7b1

check for annotation on constant constructor

5e6300e

garrigue mentioned this pull request Jan 29, 2021

Allow to name existentials in pattern-matching #9584

Merged

change syntax

8990183

alainfrisch reviewed Jan 29, 2021

View reviewed changes

test

e05ec21

put variables befor pattern in parsetree

563146b

garrigue closed this Feb 19, 2021

Octachron added a commit to Octachron/ocaml that referenced this pull request Nov 30, 2021

ocaml#10180: missing cmi can hide type declaration

0e5d504

Octachron added a commit to Octachron/ocaml that referenced this pull request Nov 30, 2021

ocaml#10180: missing cmi can hide type declaration

e4e99b9

Octachron mentioned this pull request Nov 30, 2021

#10780: a missing cmi can hide a concrete type declaration #10799

Merged

gasche mentioned this pull request Nov 19, 2022

Allow existential types introduced in a constructor pattern to be bound without tuple type constraints patterns #11491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Name existentials : new approach #10180

Name existentials : new approach #10180

garrigue commented Jan 29, 2021 •

edited

garrigue commented Jan 29, 2021

alainfrisch commented Jan 29, 2021

alainfrisch Jan 29, 2021

gasche commented Jan 29, 2021

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

gasche commented Jan 29, 2021

alainfrisch commented Jan 29, 2021 •

edited

lpw25 commented Jan 29, 2021

garrigue commented Jan 31, 2021

garrigue commented Jan 31, 2021

garrigue commented Feb 19, 2021

Name existentials : new approach #10180

Name existentials : new approach #10180

Conversation

garrigue commented Jan 29, 2021 • edited

garrigue commented Jan 29, 2021

alainfrisch commented Jan 29, 2021

alainfrisch Jan 29, 2021

Choose a reason for hiding this comment

gasche commented Jan 29, 2021

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

gasche commented Jan 29, 2021

alainfrisch commented Jan 29, 2021 • edited

lpw25 commented Jan 29, 2021

garrigue commented Jan 31, 2021

garrigue commented Jan 31, 2021

garrigue commented Feb 19, 2021

garrigue commented Jan 29, 2021 •

edited

alainfrisch commented Jan 29, 2021 •

edited