-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A tail recursive List.flatten #2199
Conversation
Use the non-tail recursive (and faster for small lists) implementation on small lists, and switch to a tail recursive implementation afterwards.
stdlib/list.ml
Outdated
match l with | ||
[] -> [] | ||
| l1 :: r -> | ||
let new_len = len + length l1 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing the length means that all lists will be traversed twice. You should keep track of the number of elements while traversing l1
during concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question: is counting the number of elements during the traversal necessarily going to be faster? There is going to be one traversal instead of two, but it will cost more at each step.
I guess that makes sense if the cost of following list pointers dominates the cost of other operations (because of the possible cache misses?)? Is that the correct intuition?
Thanks for the contribution. I agree it's a good idea to make at least the most common functions from I'd have a preference for addressing all/most functions at once, but I'm not opposed to incremental improvements. That said, such changes can impact performance significantly, and since lists are so pervasives in OCaml code, one needs to be careful. So we really want to see some results on micro-benchmarks (with various list lengths) when such changes are proposed. |
Actually, the threshold seems wrong to me: the depth of the call stack is going to be roughly |
Indeed, it is necessary to make append tail recursive as well, and to count the number of recursive calls instead of the number of elements. I did not realise that, as the code shows.
I am not against porting all functions from List in this PR, if we agree on that. |
Having minimal performance regression for small-medium input lists is of course something to aim for. Is that realistic to fine tune the implementations for the different backends at once? |
Is there a hope of writing a program transformation that turns the naive version into the painful-to-read-but-more-convenient-one, and use it through an explicit annotation on List programs? A simple first iteration would be to detect exactly tail-recursion-modulo-cons patterns, that is functions that are tail-recursive except for a Inputs are of the form let rec f args =
C[t1 :: f args1, ..., tN :: f argsN]
[f args'1, ..., f args'M]
[u1, ..., uO] where C is a multi-hole context, where each hole is in tail-position and all expressions in tail-position are covered by a hole. (The Outputs would be of the form, I guess let rec f args = f_nontail limit args
and f_nontail count args =
if count = 0 then f_tail [] args
else C[t1 :: f_nontail (count-1) args1, ..., tN :: f_nontail (count-1) argsN]
[f_nontail count args'1, ..., f_nontail count args'M]
[u1, ..., uO]
and f_tail acc args =
C[f_tail (t1 :: acc) args1, ..., f_tail (tN :: acc) argsN]
[f_tail acc args'1, ..., f_tail acc args'M]
[List.rev_append acc u1, ..., List.rev_append acc u0] This is a restricted form of the tail-recursion-modulo-constructor pattern implemented in #181 (cc @let-def), restricted to list functions, and without using any tricky runtime support to avoid the cost of the final argument reversal. |
+ the application of append + the number of recursive calls
@let-def Indeed, #181 would be a nicer solution, if the performance is uniform enough. In the meantime, I do think that Switching to dynamically protected version as default as suggested by this PR may be ok, but we would need benchmarks, and compare with #181. |
[] -> [] | ||
| l1 :: r -> | ||
append_aux 0 l1 (flatten_aux (i + 1) r) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you split max_recursion_threshold
?
It seems to me that you should rather call append_aux i l1 (flatten_aux (i + 1) r)
, since this is the same stack.
Of course the trouble with this whole approach is that to be extensible we would have to export max_recusion_threshold
and the _aux
version.
Yes, |
The recursive call
in neither a tail call nor a tail call modulo constructor, or am I missing something? |
If you write flatten as: let rec flatten = function
| [] -> []
| l :: r -> flatten_aux l r
and flatten_aux l r =
match l with
| hd :: tl -> hd :: flatten_aux tl r
| [] -> flatten r then all calls are tail call or tail call modulo constructor. |
Isn't the "modulo cons" bit a problem? (Or does the OCaml compiler have a way to special case that? I know it is possible.) |
We are precisely discussing adding an optimization of this kind to the compiler (as an alternative to writing ugly code by hand), to be enabled on an opt-in basis, directed by an annotation. |
Ah, I missed the point where the conversation turned because I didn't read #181 One question: I'm curious why the optimization listed in that PR shouldn't always be on, given the utility of tail recursion modulo cons/tail recursion modulo constructor optimization. It seems there's some detail of the effect of mutation on OCaml heap management that I don't quite understand. Regardless, having to annotate such things is a drag. Perhaps there's a rule that could automatically distinguish when to do the optimization and when not to? |
I ported the TRMC code to trunk. Early benchmarks are promising, but I would like to use Core_bench to remove some noise (results: difference between non-trmc and trmc version is noise for List.map on small lists, trmc gets faster for bigger lists, but in any case, unfolding the recursion a few times results in a major speedup). On trunk: https://github.com/let-def/ocaml/tree/trmc-2019 |
I rebased your branch on top of current trunk (March 27) and managed to get it in opam 2.0. It successfully compiled
opam switch create ocaml-trmc --empty
opam pin add ocaml-variants.4.09.0+trmc ./ocaml/
mkdir ~/ocaml.4.09.0 put and add this virtual package to opam with opam pin add ocaml.4.09.0 ./ocaml.4.09.0/ |
@mookid Why did this get closed? |
List.flatten is not tail recursive, and is then subject to stack overflow.
A small illustrative test is the following:
On the trunk version this stops after ~20 iterations of the loop.
The proposed version starts like the current non tail recursive implementation, and changes strategy when the result list gets big.