Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

t6s · 2021-01-27T06:55:38Z

The constructor Tsubst of type_desc has only one type_expr as its argument while it needs two during a copy.
The two values are currently packed as a Ttuple and wrapped in an extra type_expr before they are fed to Tsubst.
This style looks quite ad-hoc, and moreover it turned out to be an obstacle in our (@garrigue and me) attempt to make
type_expr an abstract type.

This PR adds to Tsubst another argument of type type_expr option to eliminate that hackish use of Ttuple.

gasche

This is a nice improvement, and I approve in principle. I think that we could discuss the code a bit more before merging:

The role of the type_expr option is not documented at the point where Tsubst of ... is declared, maybe you could comment there that the option part is only used None for row types, during copying operations?
Many places in the type-checker now ignore the second component of Tsubst, when before they would iterate on the whole type argument (so both sides of the Tuple if it was a tuple). The assumption is that we never hit the Some case in these functions (for some of them it is clear that they never run on row types). I wondered if we wanted to explicitly assert that the second component is None, instead of ignoring it. (Or would something break if we did? That would be useful information.)
I think that it may be nice to have a separate constructor for this particular mode of use of Tsubst, Trowsubst or Trowcopy for example, so that it would be very clear that it does something different from the normal Tsubst node, and only occur on row variables. This may make the patch more invasive (some visitors of typedtree would need to be edited), however.

typing/types.mli

t6s · 2021-01-27T10:58:24Z

The role of the type_expr option is not documented at the point where Tsubst of ... is declared, maybe you could comment there that the option part is only used None for row types, during copying operations?

Oh, that documentation is indeed necessary. I added explanations in types.mli.

Many places in the type-checker now ignore the second component of Tsubst, when before they would iterate on the whole type argument (so both sides of the Tuple if it was a tuple). The assumption is that we never hit the Some case in these functions (for some of them it is clear that they never run on row types). I wondered if we wanted to explicitly assert that the second component is None, instead of ignoring it. (Or would something break if we did? That would be useful information.)

I could not clearly understand your Tuple example. Could you a bit more elaborate on it?

My assumption is that the use of the second component is concealed in copying functions and does not escape out of them.
Maybe we should assert false at irrelevant places?

I think that it may be nice to have a separate constructor for this particular mode of use of Tsubst, Trowsubst or Trowcopy for example, so that it would be very clear that it does something different from the normal Tsubst node, and only occur on row variables. This may make the patch more invasive (some visitors of typedtree would need to be edited), however.

Yes, that division of Tsubst seems too much.

gasche · 2021-01-27T11:59:26Z

I could not clearly understand your Tuple example. Could you a bit more elaborate on it?

In the code to iterate on type sub-expressions (for example Btype.fold_type_expr), before the change, iterating on Tsubst (Ttuple [a; b]) would iterate on both a and b. With the change to Tsubst(a, Some b), you are now only iterating on a. This is a change of behavior, if those functions can in fact be called in a situation where Tsubst would use a pair here.

I am not completely sure whether this is the case (whether a Some _ can be reached in the second component in those functinos). Replacing all the cases where the second member is ignored by an assert (b = None) and running the testsuite would give us some information on whether this is the case today -- but that does not necessarily mean that it is the right change long-term.

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

t6s · 2021-01-28T05:42:46Z

In the code to iterate on type sub-expressions (for example Btype.fold_type_expr), before the change, iterating on Tsubst (Ttuple [a; b]) would iterate on both a and b. With the change to Tsubst(a, Some b), you are now only iterating on a. This is a change of behavior, if those functions can in fact be called in a situation where Tsubst would use a pair here.

I am not completely sure whether this is the case (whether a Some _ can be reached in the second component in those functinos). Replacing all the cases where the second member is ignored by an assert (b = None) and running the testsuite would give us some information on whether this is the case today -- but that does not necessarily mean that it is the right change long-term.

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

Thank you, now I understand what you are concerned about. The last commit makes fold_type_expr look into the second component.

t6s · 2021-01-28T07:50:24Z

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

After my previous commit, Jacques examined this possibility and explained me that this will not happen.
Further, Jacques' commit changes fold_type_expr so that it asserts false at any occurrence of Tsubst, fixing a bug revealed by this change.

gasche

Thanks @t6s and @garrigue, this is interesting, but then what about the other cases of traversals where the second component of Tsubst is ignored? Should they all assert = None as well? Or not, for some good reason? I did another review pass marking all places that deserve a thought.

typing/ctype.ml

typing/printtyp.ml

gasche · 2021-01-28T08:17:25Z

typing/printtyp.ml

        else ty :: tyl)
+      (* Two parameters might be identical due to a constraint but we need to
+         print them differently in order to make the output syntactically valid.
+         We use Tsubst because it does not appear in a valid type. *)


I am confused by this comment that says that Tsubst is used, but (if I read the diff correctly) in fact changes from Tsubst to Ttuple.

gasche · 2021-01-28T08:17:44Z

typing/subst.ml

        (* TODO: move this line to btype.ml
           there is a similar problem also in ctype.ml *)
        ty'
      else ty
-  | Tsubst ty ->
+  | Tsubst (ty, _) ->


Can we assert = None here?

This case actually failed at one test (poly.ml). I am now looking into the bug..

This is needed when the row variable of a polymorphic variant is Tunivar.

typing/typedecl_variance.ml

ocamldoc/odoc_misc.ml

t6s · 2021-01-28T10:55:09Z

can you explain why it is the right choice to do nothing in this case?

In printtyp, Tsubst was used in filter_param to temporarily keep variable names as its first component
and that should have been the only possible occurrence of Tsubst when printing a type expression in normal situations
(ie not while debugging).
This is the reason we were ignoring the second component.

Now that the use of Tsubst is replaced by Ttuple, the Tsubst case is just discarded in the last commit.

Octachron · 2021-01-28T12:42:06Z

I think that this is my turn to ask a naive question: rather than trying to pretend that Tsubst nodes can appear in all contexts when they are only introduced during copying type graphs, would it not be simpler to carry a map of copied nodes in the copy functions?

This PR seems to go in the opposite direction by adding even more data that only exist in a very specific context to the typexpr graph.

gasche · 2021-01-28T12:48:35Z

That's an interesting suggestion but the PR as-is is already an improvement, and the change you suggest is much more invasive. I would be interested in converging on the current proposal not-too-slowly, merging it, and then letting @t6s and @garrigue decide where they want to invest their efforts next (possibly in hiding Tsubst nodes, but possibly not). It would also be more rewarding to @t6s.

Octachron · 2021-01-28T17:11:30Z

typing/ctype.ml

@@ -1235,7 +1235,7 @@ let rec copy ?partial ?keep_names scope ty =
              in
              (* Register new type first for recursion *)
              Private_type_expr.set_desc
-                more (Tsubst(newgenty(Ttuple[more';t])));
+                more (Tsubst (more', Some t));


All code paths aware of the this tuple encoding (aka the pattern matching above and one in Subst) never use more'. Since all other users would erroneously interpret this encoding as a real tuple, it seems to me that rather than the product type_expr * type_expr option, the non-encoded argument of Tsubst is a sum.

If we were to just encode the current behavior of the algorithm in this type, you are correct.
However, it seems cleaner that the meaning of Tsubst be regular, i.e. its first argument is always a copy of the original node. Then the second argument is some special information used by the algorithm, here a copy of the enclosing Tvariant node.
We can ensure that we do not depart from the current behavior of the algorithm by checking that this second argument is always None when we use the first one. We will add such assertions.

toplevel/genprintval.ml

typing/printtyp.ml

Octachron · 2021-01-28T21:09:00Z

typing/printtyp.ml

-        tree_of_typexp sch ty
+    | Tsubst _ ->
+        Otyp_stuff "<Tsubst>"
+          (* This case should not happen *)


I would prefer an assert false: the outcome tree is user facing, no users should need to known what a Tsubst is.

Sometimes we use Printtyp functions in the middle of the type-checker to debug things, but I guess we use raw_type rather than the higher-level functions. At least for raw_type I think it is important to handle intermediate states correctly.

They need not see it: it is not going to happen, as many functions will now reject it anyway.
However, as @gasche pointed, one might want to use Printtyp.type_expr in the debugger for readability, and it is better not to have the debugger crashing because of an assertion.

This seems sensible. However, the comment could be updated to: This case should not happen except when debugging the compiler.
Also, would it be too much to be extra-explicit with <Error: unexpected Tsubst>?

I changed the comment but not the output. When this happens, this should actually not be unexpected.

garrigue · 2021-01-29T02:11:04Z

I think that this is my turn to ask a naive question: rather than trying to pretend that Tsubst nodes can appear in all contexts when they are only introduced during copying type graphs, would it not be simpler to carry a map of copied nodes in the copy functions?

This code is inherited from Jérôme, and originally it only used Tlink, which was very confusing.
The main advantage of this approach is performance: no need to look up an auxiliary data structure, and we are sure we will never loop forever. (Of course this can also be done with an external data structure, just less efficiently)

This PR seems to go in the opposite direction by adding even more data that only exist in a very specific context to the typexpr graph.

We are not adding more data, since it was already there. We are just streamlining it, to make its handling easier.
Also, if we assert that Tsubst cannot occur in most places (including in fold_type_expr), this is pretty robust, and we can just ignore it outside of the copying code.

garrigue · 2021-01-29T06:50:26Z

It appears that adding

| Tsubst (ty, o) -> assert (o = None); ty

in Subst.typexp breaks typing-poly/poly.ml.
After analysis of the code, when copying the type of a polymorphic method

 method virtual m : 'a. ([< `A|`B|`C] as 'a) -> int

where 'a denotes the row variable, one may first copy the body (including the row variable), and then the row variable alone.
In that case, there is a second Some ... argument, but the first argument to Tsubst is the correct one.
So it seems that the new code (without assert (o = None)) is correct.

The reason the original code worked is due to the use of norm_univar inside copy_type_desc, which has a special case

  | Ttuple (ty :: _)   -> norm_univar ty

which extracted the Tunivar from the Ttuple. With the new approach, this should not be necessary to call norm_univar in this case.

garrigue · 2021-01-29T10:33:19Z

Added assertion to check that Tsubst does not occur outside of copying functions, and removed Btype.norm_univar, making clear why we need both copies.

gasche · 2021-01-29T10:41:33Z

The removal of norm_univar is a nice simplification, thanks.

ocamldoc/odoc_str.ml

ocamldoc/odoc_value.ml

Octachron · 2021-01-29T11:02:22Z

I concur that the removal of norm_univar is nice, and indeed makes it clear that the proposed representation is the right one.
(I tried for fun to remove Tsubst by adding an extraneous layer in the copy_scope, and performance seems terrible)

gasche

My impression is that the PR is essentially done, modulo two comments from Florian in the printer code (a question in the case Tsubst _ -> (), and a small suggestion to rephrase printing.)

gasche · 2021-01-29T14:55:39Z

typing/ctype.ml

    let vars, ty_arg =
      match repr lbl.lbl_arg with
        {desc = Tpoly (ty, tl)} ->
          instance_poly' scope ~keep_names:false fixed tl ty
      | _ ->
          [], copy scope lbl.lbl_arg
    in
+    (* call [copy] after [instance_poly] to avoid introducing [Tsubst] *)
+    let ty_res = copy scope lbl.lbl_res in


For the record, I don't really understand what is going on here. I suppose there is a problem when lbl_res and lbl_args share some parts of the type, but it's not clear who might introduce Tsubst. (I suppose instance_poly can introduce Tsubst nodes, and copy normalizes them away?)

This is the other way round: copy_sep (first phase of instance_poly) does not introduce (and should not see) Tsubst's, while copy does. Could add an assertion in copy_sep.

We could add many assertions in copy_sep, I don't think this PR is the one where we want to start doing that.

We are already adding assertions about Tsubst everywhere, so why not in copy_sep.
Now all functions that call iter_type_expr are checking this, but copy_sep is using copy_type_desc instead, which explicitly allows Tsubst, hence the need for a specific assertion.
I do not pretend that this alone will make copy_sep self-explanatory :-)

…et Tsubst retain two arguments.

…subst to print constrained type parameters; add a test case to check its output

garrigue · 2021-02-01T05:10:07Z

I checked the errors in the CI, and they come from other PRs or internal CI problems (this PR has nothing to do with the standard library documentation).
So I merge before it gets more complicated.

ocaml#10174)

gasche approved these changes Jan 27, 2021

View reviewed changes

gasche reviewed Jan 27, 2021

View reviewed changes

typing/types.mli Show resolved Hide resolved

gasche reviewed Jan 28, 2021

View reviewed changes

Octachron reviewed Jan 28, 2021

View reviewed changes

toplevel/genprintval.ml Outdated Show resolved Hide resolved

Octachron reviewed Jan 28, 2021

View reviewed changes

typing/printtyp.ml Show resolved Hide resolved

Octachron reviewed Jan 28, 2021

View reviewed changes

Octachron reviewed Jan 29, 2021

View reviewed changes

ocamldoc/odoc_str.ml Outdated Show resolved Hide resolved

Octachron reviewed Jan 29, 2021

View reviewed changes

ocamldoc/odoc_value.ml Outdated Show resolved Hide resolved

gasche reviewed Jan 29, 2021

View reviewed changes

t6s and others added 8 commits February 1, 2021 12:18

Add an optional argument to Tsubst; avoid abusive uses of Ttuple to l…

458882b

…et Tsubst retain two arguments.

comment on the arguments of Tsubst; adapt printtyp to the new Tsubst

d5c2c03

fix fold_type_expr to handle the second component of Tsubst

2f8c0a8

properly ignore Tsubst cases in mark_loops and tree_of_typexp

02bb2da

partially revert the previous commit because filter_params is using T…

2e1e69d

…subst to print constrained type parameters; add a test case to check its output

assert false when calling fold_type_expr on Tsubst

fa18e1a

deref Tlink in fold_type_expr

4882078

fix the handling of Tsubst and comments

1b86a27

garrigue added 9 commits February 1, 2021 12:18

remove Btype.norm_univar

0662c26

assert false for Tsubst outside of copying functions

b8553b7

one missing assertion

fa7f26d

rather group

1c0e65e

Tsubst in ocamldoc

856f9cc

add change log and assertion that Tsubst doesn't occur in copy_sep

9b9d88e

update comment

7bbfd63

depend

6a10cc3

check-typo

c7d2ffa

garrigue force-pushed the tidy_Tsubst branch from e85094f to c7d2ffa Compare February 1, 2021 03:20

garrigue merged commit 13fb957 into ocaml:trunk Feb 1, 2021

smuenzel pushed a commit to smuenzel/ocaml that referenced this pull request Mar 30, 2021

Make Tsubst directly retain two arguments, instead of nesting a Ttuple (

923ed6a

ocaml#10174)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

t6s commented Jan 27, 2021 •

edited

gasche left a comment

t6s commented Jan 27, 2021

gasche commented Jan 27, 2021

t6s commented Jan 28, 2021

t6s commented Jan 28, 2021

gasche left a comment

gasche Jan 28, 2021

garrigue Jan 29, 2021

gasche Jan 28, 2021

t6s Jan 29, 2021

garrigue Jan 31, 2021

t6s commented Jan 28, 2021

Octachron commented Jan 28, 2021

gasche commented Jan 28, 2021

Octachron Jan 28, 2021

garrigue Jan 29, 2021

Octachron Jan 28, 2021

gasche Jan 29, 2021

garrigue Jan 29, 2021

Octachron Jan 29, 2021

garrigue Jan 31, 2021

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

gasche commented Jan 29, 2021

Octachron commented Jan 29, 2021

gasche left a comment

gasche Jan 29, 2021

garrigue Jan 29, 2021

trefis Jan 29, 2021

garrigue Jan 31, 2021 •

edited

garrigue commented Feb 1, 2021

Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

Conversation

t6s commented Jan 27, 2021 • edited

gasche left a comment

Choose a reason for hiding this comment

t6s commented Jan 27, 2021

gasche commented Jan 27, 2021

t6s commented Jan 28, 2021

t6s commented Jan 28, 2021

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t6s commented Jan 28, 2021

Octachron commented Jan 28, 2021

gasche commented Jan 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

garrigue commented Jan 29, 2021

gasche commented Jan 29, 2021

Octachron commented Jan 29, 2021

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrigue Jan 31, 2021 • edited

Choose a reason for hiding this comment

garrigue commented Feb 1, 2021

t6s commented Jan 27, 2021 •

edited

garrigue Jan 31, 2021 •

edited