Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Tsubst directly retain two arguments, instead of nesting a Ttuple #10174

Merged
merged 17 commits into from Feb 1, 2021

Conversation

t6s
Copy link
Contributor

@t6s t6s commented Jan 27, 2021

The constructor Tsubst of type_desc has only one type_expr as its argument while it needs two during a copy.
The two values are currently packed as a Ttuple and wrapped in an extra type_expr before they are fed to Tsubst.
This style looks quite ad-hoc, and moreover it turned out to be an obstacle in our (@garrigue and me) attempt to make
type_expr an abstract type.

This PR adds to Tsubst another argument of type type_expr option to eliminate that hackish use of Ttuple.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement, and I approve in principle. I think that we could discuss the code a bit more before merging:

  • The role of the type_expr option is not documented at the point where Tsubst of ... is declared, maybe you could comment there that the option part is only used None for row types, during copying operations?
  • Many places in the type-checker now ignore the second component of Tsubst, when before they would iterate on the whole type argument (so both sides of the Tuple if it was a tuple). The assumption is that we never hit the Some case in these functions (for some of them it is clear that they never run on row types). I wondered if we wanted to explicitly assert that the second component is None, instead of ignoring it. (Or would something break if we did? That would be useful information.)
  • I think that it may be nice to have a separate constructor for this particular mode of use of Tsubst, Trowsubst or Trowcopy for example, so that it would be very clear that it does something different from the normal Tsubst node, and only occur on row variables. This may make the patch more invasive (some visitors of typedtree would need to be edited), however.

@t6s
Copy link
Contributor Author

t6s commented Jan 27, 2021

  • The role of the type_expr option is not documented at the point where Tsubst of ... is declared, maybe you could comment there that the option part is only used None for row types, during copying operations?

Oh, that documentation is indeed necessary. I added explanations in types.mli.

  • Many places in the type-checker now ignore the second component of Tsubst, when before they would iterate on the whole type argument (so both sides of the Tuple if it was a tuple). The assumption is that we never hit the Some case in these functions (for some of them it is clear that they never run on row types). I wondered if we wanted to explicitly assert that the second component is None, instead of ignoring it. (Or would something break if we did? That would be useful information.)

I could not clearly understand your Tuple example. Could you a bit more elaborate on it?

My assumption is that the use of the second component is concealed in copying functions and does not escape out of them.
Maybe we should assert false at irrelevant places?

  • I think that it may be nice to have a separate constructor for this particular mode of use of Tsubst, Trowsubst or Trowcopy for example, so that it would be very clear that it does something different from the normal Tsubst node, and only occur on row variables. This may make the patch more invasive (some visitors of typedtree would need to be edited), however.

Yes, that division of Tsubst seems too much.

@gasche
Copy link
Member

gasche commented Jan 27, 2021

I could not clearly understand your Tuple example. Could you a bit more elaborate on it?

In the code to iterate on type sub-expressions (for example Btype.fold_type_expr), before the change, iterating on Tsubst (Ttuple [a; b]) would iterate on both a and b. With the change to Tsubst(a, Some b), you are now only iterating on a. This is a change of behavior, if those functions can in fact be called in a situation where Tsubst would use a pair here.

I am not completely sure whether this is the case (whether a Some _ can be reached in the second component in those functinos). Replacing all the cases where the second member is ignored by an assert (b = None) and running the testsuite would give us some information on whether this is the case today -- but that does not necessarily mean that it is the right change long-term.

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

@t6s
Copy link
Contributor Author

t6s commented Jan 28, 2021

In the code to iterate on type sub-expressions (for example Btype.fold_type_expr), before the change, iterating on Tsubst (Ttuple [a; b]) would iterate on both a and b. With the change to Tsubst(a, Some b), you are now only iterating on a. This is a change of behavior, if those functions can in fact be called in a situation where Tsubst would use a pair here.

I am not completely sure whether this is the case (whether a Some _ can be reached in the second component in those functinos). Replacing all the cases where the second member is ignored by an assert (b = None) and running the testsuite would give us some information on whether this is the case today -- but that does not necessarily mean that it is the right change long-term.

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

Thank you, now I understand what you are concerned about. The last commit makes fold_type_expr look into the second component.

@t6s
Copy link
Contributor Author

t6s commented Jan 28, 2021

Thinking about it more, I would rather guess that we can hit cases with Some _ in these functions, for example the copy function which introduces those cases calls free_univars ty which may very well in turn call those generic traversal functions.

After my previous commit, Jacques examined this possibility and explained me that this will not happen.
Further, Jacques' commit changes fold_type_expr so that it asserts false at any occurrence of Tsubst, fixing a bug revealed by this change.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @t6s and @garrigue, this is interesting, but then what about the other cases of traversals where the second component of Tsubst is ignored? Should they all assert = None as well? Or not, for some good reason? I did another review pass marking all places that deserve a thought.

typing/ctype.ml Show resolved Hide resolved
typing/printtyp.ml Outdated Show resolved Hide resolved
else ty :: tyl)
(* Two parameters might be identical due to a constraint but we need to
print them differently in order to make the output syntactically valid.
We use Tsubst because it does not appear in a valid type. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this comment that says that Tsubst is used, but (if I read the diff correctly) in fact changes from Tsubst to Ttuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

(* TODO: move this line to btype.ml
there is a similar problem also in ctype.ml *)
ty'
else ty
| Tsubst ty ->
| Tsubst (ty, _) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert = None here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case actually failed at one test (poly.ml). I am now looking into the bug..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed when the row variable of a polymorphic variant is Tunivar.

typing/typedecl_variance.ml Outdated Show resolved Hide resolved
ocamldoc/odoc_misc.ml Outdated Show resolved Hide resolved
@t6s
Copy link
Contributor Author

t6s commented Jan 28, 2021

can you explain why it is the right choice to do nothing in this case?

In printtyp, Tsubst was used in filter_param to temporarily keep variable names as its first component
and that should have been the only possible occurrence of Tsubst when printing a type expression in normal situations
(ie not while debugging).
This is the reason we were ignoring the second component.

Now that the use of Tsubst is replaced by Ttuple, the Tsubst case is just discarded in the last commit.

@Octachron
Copy link
Member

I think that this is my turn to ask a naive question: rather than trying to pretend that Tsubst nodes can appear in all contexts when they are only introduced during copying type graphs, would it not be simpler to carry a map of copied nodes in the copy functions?

This PR seems to go in the opposite direction by adding even more data that only exist in a very specific context to the typexpr graph.

@gasche
Copy link
Member

gasche commented Jan 28, 2021

That's an interesting suggestion but the PR as-is is already an improvement, and the change you suggest is much more invasive. I would be interested in converging on the current proposal not-too-slowly, merging it, and then letting @t6s and @garrigue decide where they want to invest their efforts next (possibly in hiding Tsubst nodes, but possibly not). It would also be more rewarding to @t6s.

@@ -1235,7 +1235,7 @@ let rec copy ?partial ?keep_names scope ty =
in
(* Register new type first for recursion *)
Private_type_expr.set_desc
more (Tsubst(newgenty(Ttuple[more';t])));
more (Tsubst (more', Some t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code paths aware of the this tuple encoding (aka the pattern matching above and one in Subst) never use more'. Since all other users would erroneously interpret this encoding as a real tuple, it seems to me that rather than the product type_expr * type_expr option, the non-encoded argument of Tsubst is a sum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to just encode the current behavior of the algorithm in this type, you are correct.
However, it seems cleaner that the meaning of Tsubst be regular, i.e. its first argument is always a copy of the original node. Then the second argument is some special information used by the algorithm, here a copy of the enclosing Tvariant node.
We can ensure that we do not depart from the current behavior of the algorithm by checking that this second argument is always None when we use the first one. We will add such assertions.

toplevel/genprintval.ml Outdated Show resolved Hide resolved
tree_of_typexp sch ty
| Tsubst _ ->
Otyp_stuff "<Tsubst>"
(* This case should not happen *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an assert false: the outcome tree is user facing, no users should need to known what a Tsubst is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we use Printtyp functions in the middle of the type-checker to debug things, but I guess we use raw_type rather than the higher-level functions. At least for raw_type I think it is important to handle intermediate states correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need not see it: it is not going to happen, as many functions will now reject it anyway.
However, as @gasche pointed, one might want to use Printtyp.type_expr in the debugger for readability, and it is better not to have the debugger crashing because of an assertion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sensible. However, the comment could be updated to: This case should not happen except when debugging the compiler.
Also, would it be too much to be extra-explicit with <Error: unexpected Tsubst>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment but not the output. When this happens, this should actually not be unexpected.

@garrigue
Copy link
Contributor

I think that this is my turn to ask a naive question: rather than trying to pretend that Tsubst nodes can appear in all contexts when they are only introduced during copying type graphs, would it not be simpler to carry a map of copied nodes in the copy functions?

This code is inherited from Jérôme, and originally it only used Tlink, which was very confusing.
The main advantage of this approach is performance: no need to look up an auxiliary data structure, and we are sure we will never loop forever. (Of course this can also be done with an external data structure, just less efficiently)

This PR seems to go in the opposite direction by adding even more data that only exist in a very specific context to the typexpr graph.

We are not adding more data, since it was already there. We are just streamlining it, to make its handling easier.
Also, if we assert that Tsubst cannot occur in most places (including in fold_type_expr), this is pretty robust, and we can just ignore it outside of the copying code.

@garrigue
Copy link
Contributor

It appears that adding

| Tsubst (ty, o) -> assert (o = None); ty

in Subst.typexp breaks typing-poly/poly.ml.
After analysis of the code, when copying the type of a polymorphic method

 method virtual m : 'a. ([< `A|`B|`C] as 'a) -> int

where 'a denotes the row variable, one may first copy the body (including the row variable), and then the row variable alone.
In that case, there is a second Some ... argument, but the first argument to Tsubst is the correct one.
So it seems that the new code (without assert (o = None)) is correct.

The reason the original code worked is due to the use of norm_univar inside copy_type_desc, which has a special case

  | Ttuple (ty :: _)   -> norm_univar ty

which extracted the Tunivar from the Ttuple. With the new approach, this should not be necessary to call norm_univar in this case.

@garrigue
Copy link
Contributor

Added assertion to check that Tsubst does not occur outside of copying functions, and removed Btype.norm_univar, making clear why we need both copies.

@gasche
Copy link
Member

gasche commented Jan 29, 2021

The removal of norm_univar is a nice simplification, thanks.

ocamldoc/odoc_str.ml Outdated Show resolved Hide resolved
ocamldoc/odoc_value.ml Outdated Show resolved Hide resolved
@Octachron
Copy link
Member

I concur that the removal of norm_univar is nice, and indeed makes it clear that the proposed representation is the right one.
(I tried for fun to remove Tsubst by adding an extraneous layer in the copy_scope, and performance seems terrible)

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that the PR is essentially done, modulo two comments from Florian in the printer code (a question in the case Tsubst _ -> (), and a small suggestion to rephrase printing.)

let vars, ty_arg =
match repr lbl.lbl_arg with
{desc = Tpoly (ty, tl)} ->
instance_poly' scope ~keep_names:false fixed tl ty
| _ ->
[], copy scope lbl.lbl_arg
in
(* call [copy] after [instance_poly] to avoid introducing [Tsubst] *)
let ty_res = copy scope lbl.lbl_res in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I don't really understand what is going on here. I suppose there is a problem when lbl_res and lbl_args share some parts of the type, but it's not clear who might introduce Tsubst. (I suppose instance_poly can introduce Tsubst nodes, and copy normalizes them away?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other way round: copy_sep (first phase of instance_poly) does not introduce (and should not see) Tsubst's, while copy does. Could add an assertion in copy_sep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add many assertions in copy_sep, I don't think this PR is the one where we want to start doing that.

Copy link
Contributor

@garrigue garrigue Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already adding assertions about Tsubst everywhere, so why not in copy_sep.
Now all functions that call iter_type_expr are checking this, but copy_sep is using copy_type_desc instead, which explicitly allows Tsubst, hence the need for a specific assertion.
I do not pretend that this alone will make copy_sep self-explanatory :-)

@garrigue
Copy link
Contributor

garrigue commented Feb 1, 2021

I checked the errors in the CI, and they come from other PRs or internal CI problems (this PR has nothing to do with the standard library documentation).
So I merge before it gets more complicated.

@garrigue garrigue merged commit 13fb957 into ocaml:trunk Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants