Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape improvements #10808

Closed
wants to merge 10 commits into from
Closed

Shape improvements #10808

wants to merge 10 commits into from

Conversation

gasche
Copy link
Member

@gasche gasche commented Dec 3, 2021

This is an alternative to #10796 -- a change to the way "shapes" are computed to avoid time or space blowups in the compiler.

In essence, "shapes" are lambda-terms representing the module structure of OCaml programs, and Merlin has to reduce shape to normal forms to produce an effective "jump to definition" tool. The design chosen by @voodoos and @trefis is to perform some reductions of shapes in the compiler, and some in Merlin; but in the original PR, #10718, this was done too naively, and it would blow up on programs using modules and functors heavily, such as Irmin (see report #10779).

The current shape-reduction engine is very naive (it performs substitutions!). @trefis proposal in #10796 is a series of elaborate optimization to avoid some of the naive-reduction downsides -- for example, grouping n-ary applications together to perform a single substitution. The approach in the present PR is to implement a better reduction strategy from the start, to avoid the need for optimizations.

With this PR, we perform strong reduction on shapes (that is, as much as we can do without accessing shapes of other compilation units), in neglectible time and using no memory. (I only measured prformance on irmin.cmt, reported as a worst case). #10796 ends up producing a 95Mio irmin.cmt file in 2s, we produce a 2.5Mio cmt file in 0.5s, same as before shapes were introduced.

The PR has not been tested outside the compiler codebase + Irmin. I hope that @trefis or @voodoos could possibly test that it works well for Merlin, and/or use their cmt-measurement tools to check that there is no issue with other .cmt files.

Notes:

  • The previous work on shapes was already implement some form of strong reduction (reducing under lambdas), by doing some reductions during type-checking (deeply inside terms).
  • Implementing strong reduction properly (using an environment instead of substitutions) gives good space behavior, but irmin.cmt still sees a lot of redundant compitations (I think they come from projections corresponding to module-signature coercions). I had to implement memoization to get good time behavior. - I tried to add hashconsing on top of memoization, but did not observe any gain on irmin.cmt.
  • Note that memoization reduces in-memory size of structures, by sharing identical subresults, but the -dshape output can still blowup.
  • I kept two of @trefis commits from Module shapes for easier tooling #10718; we checked that removing them does not degrade performances of this PR, but they still produce better / more readable -dshape outputs.
  • I believe that with more clever strong-reduction strategies (substitution only for "useful redexes"), we may not need memoization and could remove the redudancy of -dshape output as well.

@xavierleroy
Copy link
Contributor

Neat idea! I admit I didn't read the code, but: which strategy / abstract machine do you use for strong reduction?

@gasche
Copy link
Member Author

gasche commented Dec 3, 2021

It's a home-grown strategy that I got to by trying to minimally adapt a simple (weak)call-by-value evaluator to strong reduction:

The core looks like this (simplified):

type strategy = Weak | Strong

let reduce env t =
  reduce_ env Strong t

and reduce_ strategy env t =
  (* the full implementation memoizes on (strategy, env, t) *)
  let reduce env t = reduce_ strategy env t in
  match t with
  | Abs(var, body) ->
    begin match strategy with
    | Weak -> t
    | Strong ->
      let env = bind env var (Var var) in
      let body = reduce env body in
      Abs(var, body)
    end
   | App(f, arg) ->
     let f = reduce_ Weak env f in
     let arg = reduce_ Strong env arg in
     begin match f with
     | Abs(var, body) ->
         (* we only add Strong normal forms
            to the environment. *)
       reduce (bind env var arg) body
     | _ -> App(f, arg)
     end
   | Var id ->
     (* already in Strong normal form *)
     Ident.find_same id env

The use of two strategies (as opposed to using the Strong mode everywhere) avoids traversing the body of each lambda-abstraction twice, first when computing reduce env f and second when computing reduce (bind env var arg) body.

reduce_ Weak does not exactly perform weak-reduction, it will strongly reduce function arguments. If I remember correctly your work with Benjamin Grégoire, you perform standard weak reduction and then iteratively reduce under abstraction bodies; but I thought that this may duplicate reduction work by substituting terms in weak-normal-form (rather than strong-normal-forms) during beta-reduction.

Ideally I think one would have an abstract machine that only performs "needed" substitutions: I would prefer for the normal form of (fun x -> { foo = x.foo; bar = x.bar; }) (f(a)(b)(c)) (terms of the form { foo = x.foo; bar = x.bar } come from the translation of module coercions (X : sig type foo type bar end)) to be let x = f(a)(b)(c) in { foo = x.foo; bar = x.bar } rather than the current { foo = f(a)(b)(c).foo; bar = f(a)(b)(c).bar }. But I didn't think of a simple way to implement this (I guess I should ask @Ekdohibs...), and the current naive approach works well: the two occurrences of f(a)(b)(c) in the normal form are in fact shared in memory, so the -dshape output is large but the .cmt is small.

(Another thing to try would be to not do strong reduction at all, stick with weak reduction. My intuition tells me that the normal-form size blowup problems that we see are due to neutral terms occurring in strong reduction; in the example above, if we wait to reduce until the f(a)(b)(c) are known and reduce to a structure, there is no duplication anymore. The question is whether those strong reductions save some work that Merlin would have to do; I think it would be fine, but again the current approach works well enough that I didn't try.)

irmin.cmt back to 2.5Mio

This changes the test output for shapes that are not printed in normal
form anymore. (Following commits go back to the original outputs.)
The previous approach to shape reduction, which use the Shape.app
smart constructor during module type-checking, would perform shape
reduction under functor abstractions -- strong reductions.

Strong reductions are the case that blows up on some interesting
examples, including Irmin: under functor abstractions, shapes of
module type may not reduce to structures (but neutral terms containing
free variables), and the projections
  {f = S.f; g = S.g; ...}
that correspond to the translation of module signature coercions
  (M : sig type f type g end)
may generate arbitrary many copies of the non-trivial shape S>

The approach in this commit is to reduce shapes for the whole
module (treating references to other compilation units as free
variables; this remains a "local" reduction), but very weakly: without
going under functor abstractions, and in fact without reducing functor
applications either.

In particular, the production of irmin.cmt, a known-bad case, does not
blow up.

We added a new example in the testsuite that produces an
exponentially-large shape, but can still be computed in a short amount
of time.

Note: existing shape tests do *not* pass with this commit, as we
reduce less aggrssively than before. Follow-up commits are going to
implement a stronger reduction stragegy.
In the cases where we see a shape size blowup, repeated substitutions
can result in a memoy usage blowup. When using an envionment instead,
memory usage to build irmin.cmt remains constant (instead of growing
continuously to gigabytes of live RAM), even under stronger reduction
strategies.
@xavierleroy xavierleroy added this to the 4.14.0 milestone Dec 6, 2021
@xavierleroy
Copy link
Contributor

The Grégoire-Leroy strategy is (possibly) interesting in conjunction with a compiled implementation of weak symbolic evaluation, but I agree that it is useless for an interpreter, and that you don't want to do any compilation here.

Yes, it would be good to have @Ekdohibs 's expert opinion, and perhaps also that of your team members in Saclay who work on exactly this kind of things :-)

Naive question: the normal form of a lambda-term can be very very much bigger than the original term, so why is it appropriate to store shapes that are in normal form?

@gasche
Copy link
Member Author

gasche commented Dec 6, 2021

I have been hacking on more stuff in the direction of the present PR (including fixing a bug, oops), and I have yet to get the commit history in a "fully satisfying" state. I will finish this before I bang on more doors for reviews. (But of course volunteers are always welcome.)

I thought indeed of using one of Beniamino Accattoli's abstract machines for strong reduction, but abstract machines (especially when they involve explicit substitutions) tend to require fairly invasive changes when starting from a standard big-step evaluator, so I haven't crossed that bridge yet.

@gasche
Copy link
Member Author

gasche commented Dec 6, 2021

Naive question: the normal form of a lambda-term can be very very much bigger than the original term, so why is it appropriate to store shapes that are in normal form?

Currently there are known cases where shape computation seems to behave badly for some module-heavy programs (Irmin). If we start from the premises that shape computation is a computationally-hard problem, and that shapes can benefit all tools consuming -bin-annot information, then doing the reduction (well) in the compiler makes sense.

I am not sure that this is in fact correct: I suspect that with more massaging (see the next PR :-), shape reduction should be instantaneous for all OCaml programs (and normal forms fairly small), so both strategies would be equally reasonable.

@Ekdohibs
Copy link
Contributor

Ekdohibs commented Dec 6, 2021

From what I see, the potential problem with reduce Weak reducing function arguments strongly is that these arguments may be reduced to strong normal form, but that strong normal form may be unused (the argument itself could even be unused!). I suspect this may not be the case of standard OCaml programs, but if a change is desired to ensure no useless computations are done, the interpreter would need to be changed from strong call-by-value to strong call-by-need, with both (weak) function arguments and strong form of said arguments being computed lazily. (I can show an implementation of this with standard lambda-calculus if you want).

@gasche
Copy link
Member Author

gasche commented Dec 6, 2021

I can show an implementation of this with standard lambda-calculus if you want.

Sure :-) I must say that the idea hadn't crossed my mind that I would want to solve a practical problem using a strong call-by-need evaluation strategy, but it sounds like fun.

(I believe that the "useless work" of evaluating then-unused subterms is not a problem for shapes in OCaml programs, because people tend to not create complex module expressions that are mostly-useless. So the call-by-need aspect is probably not worth the complexity. But yet I'm curious to see the code.)

@Ekdohibs
Copy link
Contributor

Ekdohibs commented Dec 7, 2021

I have several versions of it:

  • A full version able to handle lambda-calculus + constructors and fix
  • A simpler version able to handle only the lambda-calculus
  • And an even simpler version which handles laziness using OCaml's lazy, thus avoiding needing the flag for deep/shallow reduction, at the cost of the output structure being partially lazy and needing to be fully evaluated if we really want to compute the normal form (this is the force_deep function at the end).

Any reduction work not done by the compiler will have to be done by
Merlin. We believe that strong-reduction (including under functors) is
preferable to weak-reduction in the compiler, if we can avoid time and
space blowups.

At this point, irmin.cmt needs 5mn to produce a 2.5Mio .cmt file; this
is too slow, but memory usage is fine.
This change has a radical effect on performance, it completely
eliminates the redundancies inherent in som examples, both in
time (avoiding duplicated computations) and
space (reusing physically-identical results instead of copies of the
same results).

At this point, irmin.cmt takes 0.5s to produce a 2.5Mio result; this
is fast and small.
@gasche
Copy link
Member Author

gasche commented Dec 16, 2021

I added documentation comments to my implementation and rebased the history (it's not perfect, but good enough).

I haven't tried to reuse @Ekdohibs's approach. It's surprisingly nice (in the last version, I think it would be nicer and clearer to present force_deep as a read-back function from normal forms to terms), I could give it a try.

@gasche
Copy link
Member Author

gasche commented Dec 16, 2021

I proposed a version using strong call-by-need in #10825.

@gasche gasche force-pushed the shape-improvements branch 2 times, most recently from a7bc83b to caecad8 Compare January 4, 2022 17:12
@gasche
Copy link
Member Author

gasche commented Jan 4, 2022

I realized that the design of the strong call-by-need evaluator, using an explicit type of normal forms with a read_back : nf -> t function, can also be used to simplify the strong call-by-need evaluator (and get them very close to each other).

I have updated the present PR to use this design. The code is simpler than before, and it is also smaller than the strong call-by-need evaluator, thanks to not having to work around the non-hashability of lazy thunks (there are no lazy thunks here).

Given that we expect strong CBV and strong CBN to have the same performance profiles (as long as they are memoized) on this problem (with an advantage for CBN in corner cases), I think that this version could be considered better (shorter) than the other.

@Ekdohibs
Copy link
Contributor

Ekdohibs commented Jan 4, 2022

It seems to me that in this type of normal forms, the normal forms of functions is always computed, while it would not be necessary in the case of functions which are later applied to an argument. In the case of functions with large normal forms which are only ever applied to arguments, the difference can be quite significant. Of course, this case may not happen in practice, so I am not sure whether the change is worth it.

@gasche
Copy link
Member Author

gasche commented Jan 4, 2022

This is a good (technical) point. Could I fix it by using nf option ref (no nf Lazy.t!) to evaluate it only on demand within read_back? This feels very ad-hoc.

@Ekdohibs
Copy link
Contributor

Ekdohibs commented Jan 4, 2022

That would indeed work! Also, you could completely avoid the use of Lazy.t in the strong call-by-need evaluator as well using the type (nf, t * env) result ref where you would have a lazy value, since the code pointer used for the closure is always the same (reduce), and the important part is the value of the variables bound by this closure, i.e. a term and an environment. You could also add a NLazy of t * env constructor to the type nf, and simply use nf ref where a lazy value is needed, taking care to evaluate it.

@gasche
Copy link
Member Author

gasche commented Jan 5, 2022

I'll discuss the call-by-value aspects here and your call-by-need remark in the other PR.

Thinking more about this, my opinion is that no change is required to this PR in reaction to your remark:

  1. The case of "useless computation" that you point out is not a new issue coming from the nf + read_back version of the strong call-by-value evaluator, it was already present with my previous design.

  2. My understanding of strong call-by-value is that this useless-computation is part of the specification (not a problem with my implementations). Your suggestion does in the direction of adding more "by-need" aspects to the reduction strategy. I would rather stick to standard systems (as "standard" as we can get in a strong setting...).

  3. I believe that this useless-computation case will not be a problem in practice in OCaml programs. This specific form of useless computation comes from a lambda-expression that is not returned as part of the normal form, only used for applications. But this is relatively uncommon for OCaml module shapes. It would naturally arise when a functor is defined locally, but not exported through the module interface.

(Finally: there is something that I don't fully understand with shapes, which is whether we can really erase any part of the input program. If a functor is only used locally and not exported by a compilation unit, we don't need to normalize its body to compute the compilation unit shape. But what happens when the user is editing the file locally, and asking information about names defined by this local functor? My intuition would be that we in fact need to normalize the shape of any sub-term that occurs in the program, as it can be edited by Merlin; but none of the two PRs currently does it, nor the original shape implementation.)

@trefis
Copy link
Contributor

trefis commented Jan 5, 2022

If a functor is only used locally and not exported by a compilation unit, we don't need to normalize its body to compute the compilation unit shape. But what happens when the user is editing the file locally, and asking information about names defined by this local functor? My intuition would be that we in fact need to normalize the shape of any sub-term that occurs in the program, as it can be edited by Merlin; but none of the two PRs currently does it, nor the original shape implementation

Merlin will do that. But not by going through the the compilation unit shape. The way merlin intends to use shapes is as follows: get the Path.t under the user's cursor, call Env.shape_of_path, then call reduce on that shape it just got. The reduce function used will load shapes for external compilation units, but it will also perform the appropriate reductions on any local shapes.

Note: having shapes stored in the environment make it really easy to implement this procedure, because merlin has easy access to the environment at the cursor position. It would be a lot more annoying if shapes were computed after the fact from the typedtree (and so weren't stored in the environment). I plan to answer your other questions here and in the other PRs soon, but you can view this as a sort of preliminary reply for at least one of those questions.

@Ekdohibs
Copy link
Contributor

Ekdohibs commented Jan 5, 2022

Concerning my understanding of strong cbv, it is indeed not necessary to have laziness on the normal form of functions -- I think my understanding of it was rather a weak call-by-value, strong call-by-need hybrid strategy. The main point was originally to have the property that a program terminating in weak cbv with a strong value (like a variable) would terminate as well in this strategy, unlike in strong cbv (for instance, the use of an Y combinator for recursion is enough to break termination in a strong setting). However, I realized it does not work either, as terms such as (fun y -> x) (x (fun z -> [non-terminating term])) where x is an open variable (defined by an above lambda) is terminating in weak cbv (with the value x), but will not terminate with most definitions of strong cbv, as x (fun z -> ...) needs to be reduced to a value, so the strong normal form beneath the function will be needed.

@gasche
Copy link
Member Author

gasche commented Jan 5, 2022

But currently the shapes stored in the environment (let's call them sub-shapes) are not in normal form. (With either of my PRs they are not reduced at all, with your original implementation they were only partly-normalized.) So in theory there is a risk of computation blowup there, same as when we proposed to not normalize shapes at all in the compiler. And the memoization in my PRs is only local to a given reduce call, so computation is not shared across those sub-shapes. We could:

  • decide that this is fine in practice and change nothing
  • try to memoize shape reduction globally (by also indexing on the global environment?)
  • strongly normalize all sub-shapes
  • not reduce shapes at all in the compiler, and leave the work to Merlin

@voodoos
Copy link
Contributor

voodoos commented Jan 6, 2022

I think the first option is a good and pragmatic choice:

  • decide that this is fine in practice and change nothing

@trefis benchmarks on the strong call-by-need version show that, in practice, having to reduce local shapes does not cause noticeable latency on locate queries (he did not run it yet on the cbv version but results should be close).

The current compromise of having the compiler reducing whole compilation-unit shapes and Merlin only local shapes coupled with an efficient reduction algorithm (memoized cbv or cbn) seems like a reasonable choice performance-wise.

However I just tried the latest version of this PR (strong cbv) as I wanted to have a last round of benchmark comparing it to the strong-cbn one and it hangs indefinitely while building Irmin.

But the latest CBN version performs very well: total build time on Irmin 48.572s ±0.138 instead of 48.505s ±0.208 without any shapes. My take is that we should focus on reviewing the CBN version as it works right now and performs quite well. The only drawback being the increased complexity of the implementation.

We use a proper type of normal forms, where functions are represented
by closures carrying and environment, and a read-back function from
normal forms to terms. This simplifies the
environment-capture shenanigans of the previous implementation.
@gasche
Copy link
Member Author

gasche commented Jan 6, 2022

@voodoos thanks!

I didn't reproduce the issue when I tried locally to rebuild irmin.cmt, but I think there was an issue with my test setup, apologies.

I think the issue was a silly mistake in how read_back is memoized; which should now be fixed (I updated the PR).

@voodoos
Copy link
Contributor

voodoos commented Jan 6, 2022

@voodoos thanks!

I didn't reproduce the issue when I tried locally to build irmin.cmt, but I think there was an issue with my test setup, apologies!

I think the issue was a silly mistake in how read_back is memoized; which should now be fixed (I updated the PR).

OK I run my bench again with the latest version of both PRs.

@voodoos
Copy link
Contributor

voodoos commented Jan 6, 2022

@gasche I think it stills hangs ?
Running dune build @install --verbose it stops when building irmin_git.cmo/x.

@gasche
Copy link
Member Author

gasche commented Jan 6, 2022

Ah, I was testing on irmin.cmt, thanks.

@Octachron
Copy link
Member

Closed in favor of #10825.

@Octachron Octachron closed this Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants