New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: tail recursion modulo constructors #181
Conversation
For the record, I have discussed this feature extensively with @def-lkb , and I think it is a nice thing to have opt-in. I am tired of having to explain to beginners that the nice elegant way to write On the other hand TRMC can be problematic (I do think Hansei is an interesting use of the language, and thanks Oleg for pointing out that destination-passing-style List.map breaks it), so it is important that it is only opt-in. Finally, there are also some important use-cases where TRMC does not suffice to obtain good stack usage (it is not clear how to implement |
This is a great proposal. I'm not sure I buy @gasche's argument about beginners, though. If TRMC is not automatic, having to tell beginners to enable it explicitly is not natural and doesn't achieve the goal of least-surprise. Also, it would be quite useful to benchmark various versions of List.map. Last time I checked, a manual encoding of TRMC for this case leads to significantly worse performances than the naive non tailrec version. It would be interesting to compare the TRMC version with the "defunctionalized CPS" version (i.e. using an accumulator and a List.rev), because that one could also be mechanized. (There are several ways to achieve good performance for common sizes and still have a version that works on long list (do a non-tailrec version for the first N elements; unroll the loop) -- but that's another story.) |
Le mercredi, 6 mai 2015 à 17:13, gasche a écrit :
It's always funny how people pull out the beginners argument when they find it handy to supposedly make a point. You brought nothing for the beginners since you now have to explain to them both what tail-recursion and tail-recursion modulo constructors is and that you need to add an attribute to get the latter, I have seen more beginner friendly introductions… (And as thinking about beginners when you design is usually a bad idea, always design for the working programmer. API and PL designers should really read more hci books) Best, Daniel |
I do like this idea. However, I am concerned about the addition of more mutations of immutable things since these are more costly in multicore (if the mutated block is on the major heap then you must allocate the pointed-to value on the major heap as well). I am also concerned about the interaction with delimited continuations (which -- spoiler alert -- may also be relevant to multicore). It is possible that this interaction can be fixed. Essentially copying a stack must also cause any TRMC values being built to be copied (I believe Gabriel discussed such a scheme on the caml-list). I think that this should probably be worked out properly before considering TRMC for inclusion. |
It sounds like this only applies to self calls. Can it be generalised to support arbitrary tail calls? |
@yallop this applies to all recursive functions defined in the let-bindings parents of the call-site. let rec f = …
and g =
let rec g' = … x :: f () in
g' …
in
… Here, that call to f from g' is rewritable. Note that the drawback is that this might generate an amount of code exponential to the number of nested lets. |
@lpw25 will multicore be purely a runtime or will it also require compiling programs differently? If the latter, then this optimization can be turned off. Perhaps ideally, both code paths can be included, and the runtime gets to choose which one it wants by setting a global, or via a preprocessing phase much like dynamic linking? |
Le mercredi, 6 mai 2015 à 18:16, Yotam Barnoy a écrit :
Well no, it's an all or nothing thing. Otherwise you may blow the stack depending on the runtime and you really don't want that. Daniel |
@dbuenzli Right, well, I think it's better to view this optimization not as a surefire way to fix stack overflows, but as a hint to the compiler. Even without stack overflow protection, this optimization can give you a performance boost, so it's still worthwhile. Otherwise, this really does get messy for multicore, which is coming up... any day now? |
Personally I'm first interested in writing robust and correct programs and part of this task is making sure that you do not blow the stack whenever your data becomes arbitrarily large. |
I think this is wrong. Most tricks to avoid blowing the stack (CPS, extra accumulator, mutation) tend to produce code much slower than the non-tailrec version. |
My bad. I guess it's only a performance optimization as compared to reversing the reverse-mapped list, which in itself is done to avoid stack overflow. |
This doesn't match my observations, but my benchmarks were not made with latest version, so take what I am about to say with a pinch of salt. Generally, code with and without trmc were performing similarly.
The unrolled version was faster than Core_list.map one, quite significantly on large lists (when Core switches to the array representation). |
I would be interested to see such cases, and how they would behave when the partially built list does not fit in the minor heap during the entire List.map (either because the list is very long, or just because the mapped function allocates). |
Ok, I fixed a bug I introduced while rebasing and now the worst case is 30% slower with TRMC with |
After discussion with @lpw25, in order to make TRMC behaves well multicore gc and delim-cc, we decided to restrict TRMC rewriting to recursive calls occurring at the last position of the block being built. That is for |
let rec map1[@trmc] f = function [] -> [] | hd :: tl -> f hd :: map1 f tl;;
let rec map2[@trmc] f = function [] -> [] | hd :: tl -> let r = f hd in r :: map2 f tl;; ? (i.e. that the trmc attribute can change the evaluation order)
let rec map f = function [] -> [] | hd :: tl -> f hd :: (map f tl [@trmc]) or: let rec map f = function [] -> [] | hd :: tl -> f hd :: (map[@trmc] f tl) This would be more in line with the [@tailcall] attribute, and makes it more explicit where users expect a call to be turned into a tail one. |
I'm still interested to see concrete cases where the naive version trmc's map is faster than the non-trmc version, for the same evaluation order. I ran the following benchmark: let rec map_trmc [@trmc] f = function
| [] -> []
| hd :: tl -> let r = f hd in r :: map_trmc f tl
let rec map f = function
| [] -> []
| hd :: tl -> let r = f hd in r :: map f tl
let n = 10000000
let test len =
let l = ref [] in
for i = 1 to len do l := i :: !l done;
let l = !l in
Gc.compact ();
let t0 = Unix.gettimeofday () in
for j = 1 to n / len do ignore (map succ l) done;
let x1 = Unix.gettimeofday () -. t0 in
Gc.compact ();
let t0 = Unix.gettimeofday () in
for j = 1 to n / len do ignore (map_trmc succ l) done;
let x2 = Unix.gettimeofday () -. t0 in
Printf.printf "len = %08i: ratio = % 6.02f%%\n%!" len (100. *. x2 /. x1)
let () =
test 10;
test 100;
test 1000;
test 10000;
test 100000 and I get:
changing the implementation to: let rec map_trmc [@trmc] f = function
| [] -> []
| hd :: tl -> f hd :: map_trmc f tl
let rec map f = function
| [] -> []
| hd :: tl -> f hd :: map f tl give very different results:
but here the semantics is different (the trmc version always evaluate left-to-right). Now, using: let rec map f l = List.rev (List.rev_map f l) I get:
|
Yes, trmc force the trmc-call (right most) to be evaluated last.
Nice, I also find that better and more predictible. Unless people express other opinions, I think I will apply your suggestion. |
About your benchmark: I likely made the mistake of having wrong evaluation order in my tests. In a second run, I unrolled the map function a few times and the difference was much larger than between TRMC and non-TRMC versions. And this time I took care of having the appropriate evaluation order. (Which makes me think that this rewriting might conflict with flambda unrolling) |
Is TRMC a global or local transform? My previous mental model was of TRMC as whole-function transform, and I was just writing a message explaining that I dislike your proposal for this reason. Thinking about it more, it looks like (TRMC indeed requires a whole-function change but) the possibilty for each recursive call to be TRMC-ed is independent from the rest of the recursive function body. If that is really the case, then maybe it should indeed be at the level of the call. This would also have the advantage of easily allowing the user to specify which case should be trmc-ed if there are two recursive calls under the constructors, which can be helpful when the recursion depth is uneven. (But as noted by @def-lkb this can already done by using (I think in this case it would be both fairly reasonable and intuitive to allow both positions.) |
I like this idea, can we pass all attributes to function declarations downto lambda layer in a modular way instead of the ad-hoc passing down, now it seems if we want to add one optimization, we need change lambda interface, this is already the case with [@taillcall], this will quickly makes lambda layer bloat |
@def-lkb you should add tests. |
We discussed it at the developer meeting today, and the consensus was that there were too many changes in for 4.03, so this could be postponed. Also, the idea of only ever mutating the last field (as a way to create a possibility of a workaround to support multi-shot continuations) proved unpopular. I don't see it suggested in the PR (and I don't understand the full details of it: if the last field is not a self-call but some other is, then no TRMC is performed on this use-site?). On the other hand, I do think that supporting multi-shot continuations was a valid move (for HANSEI for example); we may need to think harder about (1) less ad-hoc designs for the multi-shot case or (2) whether a clear specification exists that encompasses both choices of behaviour, to at least be able to give a stable specification that allows changing our minds about this specific design detail. |
@let-def What is the plan for this pull request? |
@mshinwell Unless there is a consensus that this is better implemented in flambda than lambda (potentially better optimizing but no longer benefiting bytecode interpreter), I will port to latest version. |
@let-def in #2199 (comment) you announce that you rebased this PR on trunk. Would you update the branch so that the PR itself remains up-to-date? |
Any news about this PR? |
On today's developer meeting it was decided that some version of this will be integrated into the compiler, in principle in time for 4.10. |
Have we considered calling it something like |
cd
opam switch create ocaml-trmc --empty
opam pin add ocaml-variants.4.10.0+trmc ./ocaml/
eval $(opam env) Result: OCaml version 4.10.0+dev0-2019-04-23
# let rec map f = function
| [] -> []
| x :: xs ->
let y = f x in
y :: map f xs
[@@trmc] ;;
val map : ('a -> 'b) -> 'a list -> 'b list = <fun>
# map ((+) 1) [1;2;3;4];;
- : int list = [2; 3; 4; 5] OPAM switch and installed packages: # switch compiler description
default ocaml-system.4.05.0 default
* ocaml-trmc ocaml-trmc # Packages matching: installed
# Name # Installed # Synopsis
base-bigarray base
base-threads base
base-unix base
ocaml 4.10.0 The OCaml compiler (virtual package)
ocaml-config 1 OCaml Switch Configuration
ocaml-variants 4.10.0+trmc pinned to version 4.10.0+trmc at git+file:///home/kromanov1/ocaml#trmc-2019 |
He considered using something like |
@let-def: gentle ping. I've been wishing for this feature again (I'm thinking of sending a |
@gasche Thank you very much for keeping track of this PR. Various list traversal functions naturally appear in our work from time to time. Just recently I needed And, of course, non-tail recursive P.S. |
@let-def Ping? |
@pmetzger I should have time to look at that next month. |
@let-def Thank you! |
@let-def I am rebasing your branch from time to time - everything works well so far (rebased version in my fork/trmc-2019). The OPAM switch recipe above also works (checked today). |
I just rebased on current trunk again - https://github.com/ksromanov/ocaml/tree/trmc-2019-Nov Unfortunately there were significant changes in drivers, so, may be I messed up with warning defaults. Unfortunately, since several files are compiled with And the last - ksromanov@9d1101b fixes a small bug - warning 70 is issued even when function is applied partially (there is no call => no tail call optimization). I also checked that trmc-enabled compiler can be installed as OPAM switch (with a small hack) and compile dune/lwt/react and some other packages. I also checked that it can compile small test with |
This pull request introduces tail-recursion modulo constructor, which allows to write
a version
List.map
that is both tail-recursive and natural.Using TRMC
Getting the benefits of TRMC is opt-in. Since it turns stack allocation into mutation and heap-allocation, it might have an impact on the program profile.
However, because of the nature of TRMC mutations, they almost never cross heaps and stay in the fast-path (I don't have recent benchmarks yet, but the performance cost, if any is negligible).
Attributes
A few syntaxes have been tried, the two most satisfying ones are:
The first one applies only to global bindings (item attributes are not yet allowed on local-let).
The second one required updating the parser, as attributes were not allowed on the identifier of function binding, and proved quite tricky. It is the one implemented in current revision.
Finally, the
-force-trmc
flags enable automatic rewriting of calls to trmc style.TRMC cases
TRMC applies to all tail-positions consisting of value constructors, possibly nested, applied to the result of a recursive call to a "trmc candidate", that is a recursive function being defined which have the
[@trmc]
attribute specified above.Examples:
y :: map f xs
from above,y :: z :: map f xs
would qualify alsoe.g.
Node (l,r) -> Node (tree_map f l, tree_map f r)
, one of the call willbe rewritten in TRMC style but which one is not specified.
In the last case, use an explicit let-binding to enforce the order. TRMC rewritting won't cross let-bindings, so
will disable TRMC.
will force TRMC on the right subtree.
Also, you shouldn't rely on the evaluation order in the expression in tail-position, as trmc rewritting might change the ordering (trmc call evaluated last). But since afaik evaluation order of arguments is not strongly specified in OCaml, it should already be the case.
Warnings
Three new warnings are introduced:
[@trmc]
attribute on a function which doesn't satisfy trmc requirements[@trmc]
attribute on a function which is never applied in trmc position[@trmc]
attribute (useful to detect trmc opportunities in existing code base).However, in the last case, making this a warning is not satisfying, as plenty of codes with
-warn-error A
will break.Cons
There are a few cases where observable behavior will change or break.
When using delim-cc or hansei. Since the continuation captures the stack and not the heap, and trmc move values to the heap and out of the stack really early, one will be able to observe values getting mutated. TRMC shouldn't be used in such cases.
It is unclear how TRMC will behave with multicore GCs and alternative runtimes (like js_of_ocaml, although it is more likely to be fine). The mutations introduced will touch values which are not expected to be mutated.
As far as OCaml semantics and the classic OCaml runtime are concerned, this is fine since these values are not observable. But different runtimes might have different expectations.
let rec l = 1 :: l in map succ l
might prove to be a bit memory hungry instead of failing quickly.