Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite get_unboxed_type_representation using a terminating algorithm instead of fuel. #10479

Draft
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

nchataing
Copy link
Contributor

This PR proposes to rewrite get_unboxed_type_representation in Typedecl_unboxed, based on the algorithm presented in http://gallium.inria.fr/~scherer/research/constructor-unboxing/constructor-unboxing-ml-workshop-2021.pdf.

Following #10337, we were obliged to redefine the mem function from Btype.TypeMap and Btype.TypeHash to incorporate the repr normalisation. This is maybe something that we want to generalize from said modules?

Also we were unsure how to test this function properly. The function principal use is to later compute the immediacy of a type. Maybe the testsuite related to immediacy suffices in this case?

With @gasche.

@gasche
Copy link
Member

gasche commented Jun 26, 2021

(cc also @stedolan, @lpw25)

@xavierleroy xavierleroy changed the title Rewrite get_unboxed_type_representation using a termination algorithm instead of fuel. Rewrite get_unboxed_type_representation using a terminating algorithm instead of fuel. Jun 28, 2021
@stedolan
Copy link
Contributor

This looks promising! I had a look a while ago at tracking the set of paths already expanded, but gave up after discovering the int id id example - it's nice to see the idea taken further.

  • I found the termination argument in the paper a bit hard to follow. Here's a version of the same argument that made things click for me:

    Since there are only finitely many type definitions in scope, an infinite expansion must expand a single type definition t infinitely many times. Each such operation is triggered by a Tconstr(t, ...) node which is not annotated with t, and this node does not appear in the expansion. Since all newly created nodes will be annotated with t, the number of such nodes is decreasing after the first expansion.

    (In other words, the number of expandable instances of type t can grow only until the first expansion of t. From then on, the set shrinks, providing a decreasing measure).

  • Both this argument and the paper's rely on the assumption that a type's head is disjoint from its body ("does not appear in the expansion" above, and "each node has a finite number of children" in the paper). While I think this is true for the syntactic system in the paper, it's not true in full OCaml with equirecursive types. For instance, using ocaml -rectypes:

    type 'a id = Id of 'a [@@unboxed];;
    type t = ('a id as 'a) [@@unboxed];;
      (* diverges *)
  • Incidentally, I think the proposed algorithm is equivalent to the one the C preprocessor uses to ensure termination of expansion, where tokens resulting from a macro expansion cannot cause a future expansion of the same macro.

  • Does this algorithm accept all the same programs that the old one does? I don't think it's vital that it does, but I think it might: as far as I can tell, this algorithm will only fail in cases where the old one would run out of fuel.

  • I am somewhat hesistant about the idea of annotating type nodes, since it risks making distinctions that the rest of the type system does not. Specifically, given type s = int;; type 'a id = 'a, the types int and s id are equal but disagree on annotations. Might one be accepted in a context where the other is rejected?

@gasche
Copy link
Member

gasche commented Jun 28, 2021

Thanks @stedolan for the excellent comments!

I found the termination argument in the paper a bit hard to follow.

Yes, "we properly write down the termination argument" was left on our TODO list for a few weeks and still is. I plan to steal this task from my intern @nchataing busy sub-list and I will compare with your own notes. We did find it tricky.

(Note: the tree structure we mention in the termination proof is related to how we initially found this idea of call-by-name call stacks. The tree structure was more apparent when our call stack was formed of pairs of a head path and its static location in the source program, but then we simplified the definition/implementation by dropping static locations, which are not necessary at runtime. It's interesting that your equi-recursive example below is one where the notion of static location is hard to define.)

type t = ('a id as 'a) [@@unboxed];;

This is an excellent find, our algorithm diverges here, which is unpleasant. (Note: I think there is no problem with those equi-recursive types that are accepted without -rectypes, namely object and polymorphic-variant types, as the outer computation terminates on those types instead of recursively traversing their sub-type-expressions.)

Do you have an idea for how to avoid this issue in a nice way? ("Let it diverge" and "reintroduce fuel for this corner case" are not nice.) One idea would be, when we annotate type expressions with their call stack, to detect equi-recursive cycles and annotate them with a special "poison" stack that gets handled as a cycle as soon as it gets at the head of the type. Ideally we would want to allow expanding once, but not twice; right now I don't know how to express this refinement.

Incidentally, I think the proposed algorithm is equivalent to the one the C preprocessor uses to ensure termination of expansion, where tokens resulting from a macro expansion cannot cause a future expansion of the same macro.

Maybe? (Should we be proud or ashamed of ourselves?) Here is what StackOverflow quotes of the specification:

If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

Apparently there is additional discussion on this point.

I hope that we can at least end up with a clearer description of what's going on. This relation is definitely worth mentioning in our abstract anyway, thanks for the pointer!

Does this algorithm accept all the same programs that the old one does? I don't think it's vital that it does, but I think it might: as far as I can tell, this algorithm will only fail in cases where the old one would run out of fuel.

I don't know the OCaml type-system well-enough to feel confident that it does. We avoided making any claims of completeness in the paper. (Of course we also accept definitions that the previous check rejects because they run out of fuel, but admittedly those are probably fairly artificial.)

The answer to this question is also not stable to extensions of the type system. For example, we could envision a form of parametrized types / ad-hoc type families that would allow to define type 'a foo as "if 'a is ´b id then 'b foo else a". Then int id id foo would expand to int yet be rejected by our algorithm.

I am somewhat hesistant about the idea of annotating type nodes, since it risks making distinctions that the rest of the type system does not. Specifically, given type s = int;; type 'a id = 'a, the types int and s id are equal but disagree on annotations. Might one be accepted in a context where the other is rejected?

In your example, the algorithm running on a type of the form T[int] or T[s id] is going to behave in the same way until the node int or s id gets in head position (in particular the head callstack in the two related runs should be the same when we reach this head position). At this point the current version of the PR does a Callstack.expand_head_opt call that, if I understand correctly, is the identity on int, and will replace s id by int, re-annotating all the "new" structure in the post-expansion type with the head callstack of the pre-expanded type (this is what extend does in the definition of Ctype.expand_head_opt). In other words, as long as the two (distinct) nodes are in the expansion relation, the behavior should be identical.

Some context: Annotating type nodes is how we implement annotating type subexpressions in our paper description of the algorithm. We found it tricky in some ways (several small bugs caused by traversal operations changing type identities in an unexpected way; for example we are not completely sure yet that the use of Ctype.expand_head_opt here is correct, our head-shape version of the algorithm currently performs the expansion by hand.).

The behavior is clear when all syntactic subexpressions in a type correspond to distinct type nodes (then the notion of "type node" and "subexpression" coincide). If there is more sharing in the type, my intuition is that there no problem if the sharing is between type expressions that are non-overlapping in the syntactic presentation. If they are nested in one another, then we have a problem, as your ('a id as 'a) example demonstrates.
(I'm not sure whether we should look for a different approach. An obvious approach is to represent the syntactic tree structure as a mapping from paths within the type to annotations, but this looks like a lot more code, slower, and not clearly better. It is probably better to keep using type node identities, with a discussion with actual type-system experts on whether we are using the right approach to do things.)

@stedolan
Copy link
Contributor

Do you have an idea for how to avoid this issue in a nice way? ("Let it diverge" and "reintroduce fuel for this corner case" are not nice.) One idea would be, when we annotate type expressions with their call stack, to detect equi-recursive cycles and annotate them with a special "poison" stack that gets handled as a cycle as soon as it gets at the head of the type. Ideally we would want to allow expanding once, but not twice; right now I don't know how to express this refinement.

I think maybe this can be solved simply: after expanding a given type_expr, update its head annotation to include the path that was expanded. That way, the set of type_exprs not annotated with a given path decreases every time that path is expanded. I think this will only make a difference in the case of equirecursive types.

@gasche
Copy link
Member

gasche commented Jun 28, 2021

This makes sense, thanks! If I understand correctly, it's also equivalent to simply recording a list/set of type_expr that we were called on so far, separately from the annotated-callstacks information.

The "callstack" information for a type expression is introduced at the point where this type expression first appear as a subterm. This new information grows differently, it is introduced at the place where the type expression is called -- appears in head position. (The fact that the update flow is different for both aspects suggests that separating them in two datastructures is sensible.)

@stedolan
Copy link
Contributor

Thinking about it, it looks like the datastructures used here could be simplified. This implementation annotates each type_expr node with a call stack, but most of the information in the annotation can never be used. The only use is to check whether the annotation on a type of the form Tconstr(p, _, _) contains p.

So, it looks like the appropriate data structure might be a table mapping type_exprs to booleans, indicating whether they may be expanded further.

@gasche
Copy link
Member

gasche commented Jun 29, 2021

Would this be faster, though? If one expansion introduces N new type nodes in the worst case, and the length of the expansion sequence is L, then for each expansion step the existing implementation with perform an N-linear computation to annotate the expansion, and a log(L) annotation to check the cycle condition, giving a O(L.(N+log(L)) complexity. With your proposal, if I understand correctly, we would perform a log(L) membership test for each (Tconstr) new node, so we have an O(L.N.log(L)) complexity.

(Note: this is not what the paragraph above is about, but I think that performance in practice is in fact quite important for get_unboxed_type_representation that, unlike our still-to-come work on explicit constructor unboxing, is used fairly heavily within the type-checker and compiler. We will have to measure things and reason about "fast path" behavior.)

@stedolan
Copy link
Contributor

I was thinking more about simplicity of data structures than performance! I think it's usually best to first find the simplest data structure for the problem, and come up with optimised versions afterwards.

If we are thinking about optimisation, I think there are other changes that would make a bigger difference. get_unboxed_type_representation is called reasonably often, but very few of those calls actually expand an unboxed type. The type system already guarantees that no looping is possibly using Ctype.expand_head_opt, so we could greatly speed things up by delaying construction of any annotations (callstacks or booleans) until we actually expand an unboxed type.

It might also be worth considering caching, perhaps by integrating with the abbrev mechanism already present in Tconstr nodes. (However, since we already cache the Ctype.expand_head_opt part, and I expect that the number of unboxed expansions is small, this might not make much difference).

@gasche
Copy link
Member

gasche commented Jun 30, 2021

I need to convince myself that the change you propose makes it as easy, or easier, to think about the system (in particular its termination properties). In term of implementation complexity, I think that it is comparable (we are changing the place where a check is made).

Re. performance, I agree with your remarks and in fact I was thinking in the same direction:

  1. avoid any book-keeping in the fast path
  2. consider potential benefits of having a second cached reference in Tconstr (one for the expansion for abbreviations, one for the expansion of abbreviations and unboxed definitions)

I think that we want to have (1) as part of this PR, and will probably give up on (2) because it's more work, more invasive, and we are unlikely to see a noticeable performance difference.

We had an actually-noticeable performance concern related to get_unboxed_type_representation in the past, namely #8776 . The problem was not with the computation itself, but the fact that it introduced calls to correct_levels in many parts of the system. This example makes me slightly nervous, but I agree that the complexity of unfolding itself is unlikely to be noticeable.

For our work on head shapes (from which the present PR is derived as a "baby" version), we have tried using Ctype.expand_head_opt but currently are not (we had issues with type nodes changing and breaking the annotation book-keeping), we are simply expanding the manifest by hand in the abstract case. We also need to better understand this. We want to have a better testsuite first, and our plan to do this is to start by adding a -dtypedecl-unboxed-repr option that displays unboxed-representations at [@@unboxed]-type declaration time, to make expect-tests pleasant and more informative. (Currently one has to play with [@@immediate], which gives poor observability of the unboxed-type-repr computation.) This is tedious to implement so we have been looking at other parts of our TODO-list for now.

@nchataing
Copy link
Contributor Author

By fiddling a bit with GADTs we were able to find an interesting example where the current algorithm would be incomplete (it cuts the cycle before a precise result can be reached). Consider the computation of the unboxed head representation of int t for the following type t:

type _ t =
  | A : bool t -> int t [@unboxed]
  | B : string t -> bool t [@unboxed]
  | C : string -> string t [@unboxed]

Some notes first :

  • the [@unboxed] attribute enables to mark single constructors as being unboxed (here we unbox the three) : this is part of the larger picture for constructor unboxing :-)
  • we suppose we have a "smart algorithm" for GADT : when we compute for the type arg t we focus only on the subset of constructors whose return type is compatible with arg.

The desired outcome when we call unboxed_type_representation on int t should be string, the only possible value form being A (B (C s)).
However, the current cycle detection algorithm rejects it because we don't allow to expand multiple times t : we would need to unbox A and B in succession.

We have an idea to fix this : at each step we store not only the head type path, but also the set of compatible constructors that were considered for this instance. We reject with a cycle only when we stumble upon the same type head with the same constructors. The algorithm would get the correct answer with this modified criterion: int t : t [A] -> t [B] -> t [C] -> string.

Remarks:

  1. We haven't implemented "constructor unboxing" yet.
  2. When we do, we don't plan to support GADTs in this "smart" way in the first implementation.
  3. We may never implement the refined termination check, so we see a future for the incomplete algorithm. We thought it could be interesting to mention already.

@gasche
Copy link
Member

gasche commented Jul 3, 2021

@stedolan wrote:

I found the termination argument in the paper a bit hard to follow. Here's a version of the same argument that made things click for me:

Since there are only finitely many type definitions in scope, an infinite expansion must expand a single type definition t infinitely many times. Each such operation is triggered by a Tconstr(t, ...) node which is not annotated with t, and this node does not appear in the expansion. Since all newly created nodes will be annotated with t, the number of such nodes is decreasing after the first expansion.

(In other words, the number of expandable instances of type t can grow only until the first expansion of t. From then on, the set shrinks, providing a decreasing measure).

This argument is not completely correct, because while t may not occur in "new" nodes created by the expansion, it may occur in "old" nodes coming from the parameters and those can get duplicated. Consider for example ('a t) t for type 'a t = ('a, 'a, 'a) u: the head-expansion of t will strictly increase the number of occurrences of t. (The expansions are "parallel" and only at most one of them will head up in head position, because an "old" subexpression can never end up as the child of a "new" subexpression in an expansion.)

I thought of a simpler and more general termination argument, feedback warmly welcome.

Consider n-ary tree whose nodes (at any position in the three) can be "expanded", giving a new tree, where the expansion of a node turns its subtree in a different subtree which contains:

  • some "new nodes" that were not part of the previous subtree,
  • and arbirary (finitely) many copies of some "old" subtrees that were descendants (transitive children) of the expanded node

OCaml types equipped with the expansion of abbreviations and unboxed types are examples of such trees.

Remark: allowing expansion in arbitrary positions (rather than just head position) does not seem necessary for get_unboxed_type_repr, but it makes sense for computing head shapes for constructor unboxing, as it allows for example to reduce the parameter type of each unboxed constructor in a variant type declaration. (Here the syntactic trees considered are a mix of type expressions and instantiated type declarations.)

Remark: when a node ('a t) expands, one may think that only possible "old" subtree is the subtree immediately under (t), but in fact with type constraints and unboxed GADT we can have arbitrary descendants, consider

type _ t = Foo of ('a * 'a) -> (int -> 'a) t [@@unboxed]

Claim: if nodes are equipped with a well-founded order such that a "new" node after expansion is strictly smaller than its expanded node, then the tree expansion process is strongly normalizing.

(Induction argument: if all the descendant subtrees of a node are strongly normailizing, and all nodes strictly below it in the node order are strongly normalizing, then it is strongly normalizing.)

Our algorithm annotates tree nodes with a "call stack", with new nodes having a larger call stack, and nodes becoming neutral / un-expandable if their head constructor is in their call stack. Callstack extension forms a well-founded order, as callstacks are bounded in size (at most the number of type declarations). As a consequence of the claim, expansion is strongly normalizing.

@stedolan
Copy link
Contributor

stedolan commented Jul 5, 2021

This argument is not completely correct, because while t may not occur in "new" nodes created by the expansion, it may occur in "old" nodes coming from the parameters and those can get duplicated. Consider for example ('a t) t for type 'a t = ('a, 'a, 'a) u: the head-expansion of t will strictly increase the number of occurrences of t. (The expansions are "parallel" and only at most one of them will head up in head position, because an "old" subexpression can never end up as the child of a "new" subexpression in an expansion.)

My argument was about the representation used in the implementation, where type expressions are graphs, not trees, and expansion preserves sharing. I don't think expansion can ever duplicate a type parameter, and I think annotations are applied to a type_expr node by identity rather than by position in the type's expanded tree.

Claim: if nodes are equipped with a well-founded order such that a "new" node after expansion is strictly smaller than its expanded node, then the tree expansion process is strongly normalizing.

This is believable but not obviously true to me! Do you have a more fleshed-out version of this argument? (In particular, the fact that an expansion may generate no new nodes seems to make things tricky).

It reminds me of of Goodstein's theorem and the hydra game.

@gasche
Copy link
Member

gasche commented Jul 5, 2021

My argument was about the representation used in the implementation, where type expressions are graphs, not trees, and expansion preserves sharing.

Ah! It may be easier to control termination on graphs (or explicit substitutions) indeed, I need to think about it. If there was a sensible proof that also worked with (unshared) trees, that would still have my preference. (In particular it makes it easier to explain to other people/communities.)

Do you have a more fleshed-out version of this argument? (In particular, the fact that an expansion may generate no new nodes seems to make things tricky).

Let me try. I summarized the argument here:

(Induction argument: if all the descendant subtrees of a node are strongly normalizing, and all nodes strictly below it in the node order are strongly normalizing, then it is strongly normalizing.)

The proof of this "induction argument" is by induction on the tree. (Note: I'm not considering cyclic graphs here at all, but I believe the argument can be adapted to incorporate the cyclic-graph-detection countermeasure.) Consider an expandable/expansible node in the tree. I defined the possible expansions of a node/subtree as a new subtree whose nodes are either copies of a strict subtree of the expanded node, or "new nodes" that (by assumption) are strictly below it in the node order. By the assumptions of the induction argument, all nodes in the expanded subtree are strongly normalizing, which means that the expanded subtree is strongly normalizing, so the expandable node is itself strongly normalizing (any expansion step gives a strongly normalizing expanded tree).

(I don't see the difficulty with the case where the node expands only to one of its descendants, instead of creating new nodes; I assumed that its descendants were all strongly normalizing.)

Corollary: any tree is strongly normalizing. The proof goes from the terminal nodes of the tree (who have no children, and therefore are strongly normalizing) rootwards to the root of the tree.

@stedolan
Copy link
Contributor

stedolan commented Jul 5, 2021

Thanks for the explanation! I more or less believe this now. I also agree that it would be nice to have a version of the argument that worked on trees.

To nitpick the proof a bit, your base case doesn't quite work:

The proof goes from the terminal nodes of the tree (who have no children, and therefore are strongly normalizing) rootwards to the root of the tree.

The proof at the terminal nodes is not trivial. You need to show that the property is true not just of their subtrees (easy, vacuously true), but also of all nodes strictly below them in the well-founded order. The nodes below terminal nodes need not be terminal: indeed, you can have an arbitrarily large node below a terminal node (e.g. type t = big typ).

So I don't think you can do this just by induction on the tree, but I suspect the argument you provided works if used with a better induction principle (something like a nested induction first on the well-founded order and then on trees?)

@gasche
Copy link
Member

gasche commented Jul 5, 2021

You are correct, the "corollary" part does not work as-is. I thought that this was a simple nested induction, but I didn't find an easy way to nest the inductions. In fact, the idea of trying to do tree induction is fragile in general (if a tree T expands through a subnode S rather than a head expansion, it's hard to control the effect of the expansion on the whole tree.) Below is a different argument, considering reduction sequences directly. What do you think?

First a remark: when I say that there is a well-founded ordering of nodes, I mean that there exists a well-founded measure on the nodes that decreases for the new nodes of an expansion. Technically it's a well-quasi-order on nodes, as it is not antisymmetric.

We can get a (well-quasi-ordering) measures on trees, rather than nodes, by considering for each tree the set of measures of its nodes, using the standard lifting of well-quasi-order on sets: A <= B if for any a in A there is a b in B such that a <= b. (As is standard, we define the strict ordering a < b as a <= b and not (b <= a).)

When a tree T reduces to T' by an expansion at some node, then it is the case that T > T' only when the expansion creates new nodes, otherwise we have T >= T'. I understand now why you were wondering about expansions that do not create new nodes!

Strong normalization proof: Consider a reduction sequence of trees. If a reduction from T to T' does not create new nodes, then the expanded (sub)node expands to one of its descendant subtrees: we have T' <= T and size(T') < size(T). After a finite number of such reductions, there must be an expansion from T to T' that creates a new node, so we have T' < T (and no relation between their sizes). In other words, any reduction sequence is finite by well-quasi-ordering of the lexicographic product of our tree measure and tree size.

@stedolan
Copy link
Contributor

stedolan commented Jul 7, 2021

I don't think that proof fully works either!

After a finite number of such reductions, there must be an expansion from T to T' that creates a new node, so we have T' < T (and no relation between their sizes).

Every node in T' is either new (and therefore below the expanded node in node order), or part of T. This is enough to establish T' <= T. But to get T' < T we also need not (T <= T'), which means that we must find some node a in T for which there is no b in T' with a <= b. I don't think this exists in general.

Concretely, suppose that T consists of a node p with a child q where p < q, and we expand p to p' (where p' < p), keeping q. (e.g. via a definition like type 'a t = 'a s). We have T' <= T because {p', q} <= {p, q} because p' <= p and q <= q. However, we also have T <= T' because {p, q} <= {p', q} because p <= q and q <= q, so we don't get T' < T.

@gasche
Copy link
Member

gasche commented Jul 7, 2021

Ouch! Back to the drawing board...

@gasche
Copy link
Member

gasche commented Jul 8, 2021

I'm back with a powered-up variant of my first proof attempt (considering the set of node measures as a tree measure). The proof is not really/only mine, I complained about this problem to my partner over lunchtime and she suggested this. (If we could invite Nicolas and Stephen over for dinner we would solve this real quick!)

The idea is to measure a node within a tree by its "node-path measure", which is the multiset of measures of the nodes on its path to the root (including the node itself). Then we measure a tree by the multiset of the node-path measures of its nodes.

Multisets of well-quasi-ordered elements are well-quasi-ordered by the Multiset order. (The order is traced back to Dershowitz–Manna in 1979 and Huet-Oppen in 1980. It's fitting to use a notion co-invented by Gérard Huet for an OCaml problem.) It is the extension of set ordering that is stable to multiset addition: A < B if A = A' + C and B = B' + C for some multiset C such that A' < B' as sets, that is, for any a in A' there exists b in B' such that a < b. For example (with the usual ordering on natural numbers) we have {1, 2, 2} < {1, 3}, with C := {1}, A' := {2, 2} and B := {3}.

Now if T expands to T', we want to prove that T < T' by this tree measure (the multiset of node-path measures). We define as C the multiset of all nodes that are not inside the expanded subtrees, A' as the measure of the pre-expansion subtree, and B' as the measure of the expanded subtree. For any node in the expanded subtree B', we must find a node in the pre-expansion subtree A' with a strictly larger node-path measure.

In general a path to a node in the expanded subtree looks like p . pn . po, where p is a path from the root to the ancestor of the expanded node, pn is a path formed of "new" nodes created by the expansion, and po is a path of "old" nodes that were present in a strict sub-subtree of the pre-expansion subtree (all three paths may be empty). We map it to a path of the form p . n . p' . po, where p is the same path unchanged by expansion, n is the expanded node, and p' corresponds to a part of the subtree that was erased by expansion.

For example (using simple lists as trees of one-children nodes):

  • if 3 -> 2 -> 1 -> 5 -> 1 expands in 2 to 3 -> 0 -> 5 -> 1 (0 is a "new node" strictly less than the expanded node 2), the expanded-tree path [3, 0, 5] can be decomposed as [3] . [0] . [5], and we map it back to the pre-expansion path [3, 2, 1, 5] decomposed as [3] . 2 . [1] . [5].
  • if 3 -> 2 -> 1 -> 5 -> 1 expands to 3 -> 5 -> 1 (no new node created), [3, 5] is decomposed as [3] . [] . [5] and mapped to [3] . 2 . [1] . [5] again.

We have measure(p . n . p' . po) > measure(p . pn . po) as is our goal; this is equivalent to measure(n . p') > measure(pn), which is direct as all nodes of pn are strictly below n ("new nodes" of an expansion are strictly smaller than the expanded node).

Note that the argument works even if there is duplication of subtrees. For example, if 3 -> 2 -> 5 expands to 3 -> 0 -> {5, 5} (the node 0 has two children nodes of weight 5), we have {[3], [3, 2], [3, 2, 5]} > {[3], [3, 0], [3, 0, 5], [3, 0, 5]} as we should.

In @stedolan's previous example, p -> q expanding to p' -> q (with q > p > p'), we have {[p], [p, q]} > {[p'], [p', q]}: unlike the set-based version, we don't have [p, q] <= [p', q] even when q > p.

Bonus remark: from a medium distance this looks related to the Kruskal tree theorem, which defines a well-quasi-ordering on trees T1 <= T2 if T can be "embedded" into and the embedding is measure-increasing on nodes -- the theorem is that this order is well-quasi-ordered if the nods are. Unfortunately the "embedding" must be injective on nodes, while our mapping from T' to T is not injective when subtrees of T were duplicated in , so we cannot use this theorem. (Kruskal theorem is one of those theorems requiring powerful ordinals to prove, so there was indeed a relation to hydra games.)

@gasche
Copy link
Member

gasche commented Jul 28, 2021

@stedolan would you by chance have an opinion on the last version of the proof?

@stedolan
Copy link
Contributor

I read it, went off to read some more about wqos, then got distracted. I've now actually read and understood the proof, and it looks correct to me!

For anyone else following along, here's my understanding of the proof: in the Dershowitz-Manna ordering, you can make a multiset smaller by removing an element and replacing it with any finite collection of elements, as long as each new element is smaller than the one you removed. This ordering is well-founded if the ordering on elements is.

This process describes both the action of expansion on individual paths and on trees viewed as multisets of paths, so expansion is decreasing in the well-founded order of multisets of multisets of node labels.

Pointless nitpick: the references I could find described the multiset order as well-founded. Do you know whether it is a wqo as well? I think it probably is, but I couldn't find a proof of this anywhere. (The argument here does not depend on wqo vs. wf, this is just for my own curiosity)

@gasche
Copy link
Member

gasche commented Jul 29, 2021

Thanks!

Regarding wo/wqo, I'm not order theoretician but my intuition is that a well-quasi-order structure on a set A, wqo(A), is isomorphic to the (dependent) product of a well-ordered set R and measure function A -> R. (It's clear that the measure defines a wqo by defining a <= b as measure(a) <= measure(b); conversely take R to be the set of equivalence classes for the antisymmetry relation, a ~ b iff a <= b /\ b <= a.) Then it looks easy, from a distance, to turn well-ordering combinators into well-quasi-ordering combinators by using this isomorphism.

@stedolan
Copy link
Contributor

I think we may be using different definitions for things! Here's what I mean by various terms:

Suppose A is a set equipped with a preorder (reflexive/transitive relation) <=, and we define a < b as a <= b & b </=a. Then the ordering is:

  • well-founded if < has no infinite descending chains
  • a well-order if it is well-founded and total (either a <= b or b <= a for any a,b)
  • a well-quasi-order (wqo) if it is well-founded and has no infinite antichains
    (that is, any infinite sequence contains two elements related by <=)

The wqo condition is a weaker version of being a total order: with just two elements, you're not guaranteed that a <= b or b <= a, but with infinitely many elements you're guaranteed at least one related pair. Componentwise ordering on ℕ * ℕ is wqo but not a well-order, and equality on ℕ is well-founded but not a wqo.

After re-reading earlier comments, I think what you're calling a wqo might be what I call a well-founded preorder. The Dershowitz-Manna construction builds a well-founded preorder on multisets out of well-founded preorder on elements, but I'm not sure that it preserves the no-infinite-antichains property, thus by the definition here I'm not sure it maps wqos to wqos. (However, for the argument here well-foundedness is enough, so whether or not it is actually a wqo doesn't affect the proof).

@gasche
Copy link
Member

gasche commented Jul 29, 2021

I used "well-ordered" by mistake above, I meant "with a well-founded order". (We never work with total orders.)

I used quasi-order and preorder interchangeably (I didn't know of quasi-order before this PR, but found the suggestion on wikipedia), so to me they meant "not necessarily antisymmetric", so indeed that corresponds to your notion of well-founded preorder. (It looks like this was a mistake, with a difference between wqo and wpo coming from subtle differences between descending chains and antichains; oh well...)

@gasche
Copy link
Member

gasche commented Jul 29, 2021

My general impression is that the termination question is now fully sorted out. I think that several important questions remain for this PR:

  1. This approach is more principled, but also has a higher implementation complexity. Do we want to go that road? (There are other analyses that we may want to implement in the future using the same approach, and sharing some termination-control code.)
  2. Assuming people decide that we want to go for this approach instead of fuel, there are implementation details that still require some more work on our part and more discussion, in particular in terms of performance implications: we want to make sure that the overhead of the function remains neglectible when it does not performance any expansion, and that the efficient-expansion machinery is used correctly (including subtle interplay with the termination-annotation-on-type-nodes mechanism).
  3. As Stephen pointed out, we need to change the code slightly to correctly deal with cyclic types.

Points (2) and (3) are prerequisites to reach a "mergeable state" for this PR. But I think that (1) is the more important question, and I would expect that it can already be discussed and possibly decided, before @nchataing and myself invest non-trivial amounts of additional work in (2) and (3). Do we believe that getting rid of fuel is worth the complexity of a new termination-control mechanism?

I wonder if people who already looked at this PR (@lpw25, @stedolan) have an opinion, and others as well. (Maybe it was not strategic from this feedback-attracting perspective to discuss at length the termination proof first, but oh well, it was fun to crowd-implement a proof instead of a program for once.)

@lpw25
Copy link
Contributor

lpw25 commented Aug 10, 2021

Personally, I don't think removing fuel is worth the effort that has been expended on creating and justifying this implementation, since I suspect that in practice no one will ever actually observe the difference. However, the code itself is pretty simple and doesn't seem hard to maintain, so we might as well have it now its written.

type t =
| Unavailable
| This of type_expr
| Only_on_64_bits of type_expr

let check_annotated ty callstacks =
Copy link
Contributor

@lpw25 lpw25 Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function just checking that extend_callstacks works, using logic almost identical to extend_callstacks? If so, I'm not sure it is gaining you anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this function is used for debugging only. It proved tricky to get an implementation right because of the reliance on type node identity, interleaved with expansions (in particular the call to expand_head_opt, which transforms the type graph). It will be possible to remove it if people agree on a final implementation.

@gasche
Copy link
Member

gasche commented Sep 30, 2021

We discussed this PR at a maintainer meeting today. Here is what I understood of the consensus:

  • in principle people are okay with using this approach instead of fuel, both for get_unboxed_type_representation and our upcoming constructor-unboxing work
  • we need to check first that the proposed change does not degrade performances (in particular, the function should remain very lightweight in the case where no unboxing-expansion is needed at all)

@gasche
Copy link
Member

gasche commented Feb 17, 2023

Some news on this topic -- rather exploratory in nature, I'm afraid.

  1. I have written the termination proof in full details, but in French (!), in https://hal.inria.fr/hal-03510931. This is Section 4, starting page 10 of the PDF file. Someone who does not read French could still probably follow the proof.
  2. In the process of porting the proof to English, I decided to ask types-list about previous references for such a result. (At its core the result is general enough that I expected other people to have proved it before.) See the types-list thread here. The short summary is that, yes, the result is already known, and in fact there exist simple proofs of decidability of termination. (But see below for more details.)

The termination algorithme we presented above is "online", in the sense that it monitors the computation, the unfolding of definitions, and is guaranteed to stop it after a finite number of step if the unfolding process would loop indefinitely.

In contrast, the simple termination proofs I am receiving pointers to (for example this one) have a rather different computational content, they perform a more "global" analysis/computations on the whole set of recursive definitions at once to decide which type definitions "introduce non-termination". (They still need to unfold definitions / perform beta-reductions, so they are "dynamic" in nature as well.) There are two issues with this for the question at hand:

  • If we tried to do this on-demand naively, the computational cost could be high: traverse various sets of recursive definitions for each call to get_unboxed_type_representation. We could narrow the set of definitions to study to only those that are relevant to the type expression at hand, but still, fairly expensive.
  • A more reasonable approach would be to design a compositional "summary" of this information that could be cached for each each type definition, exactly like variance, immediacy or separability. I think, but I am not sure, that the property of interest is: "What is a minimal set of your parameters such that, if they are instantiated by normalizing type expressions, your definition is also normalizing?". Computing this at declaration time could be quite easy, but this approach is not stable by substitution of abstract types by type definitions, so it would need to be recomputed on functor expansion, which is precisely the sort of issues we are trying to get away from in the context of immediacy (and head shapes).

I am surprised by the complexity gap I perceive between the "properties of type definition", for which termination is rather simple (the subtlety is in defining the property we want), and "online monitoring during expansion", for which termination required some hard thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants