Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A tail recursive List.flatten #2199

Closed
wants to merge 4 commits into from
Closed

A tail recursive List.flatten #2199

wants to merge 4 commits into from

Conversation

mookid
Copy link
Contributor

@mookid mookid commented Dec 12, 2018

List.flatten is not tail recursive, and is then subject to stack overflow.

A small illustrative test is the following:

let n = ref 0 in
while true do
  ignore(flatten (List.init !n (fun i -> List.init (i*i*i*i) (fun _ -> 0))));
  Printf.printf "%i: ok%!\n" !n;
  incr n
done;;

On the trunk version this stops after ~20 iterations of the loop.

The proposed version starts like the current non tail recursive implementation, and changes strategy when the result list gets big.

Use the non-tail recursive (and faster for small lists) implementation
on small lists, and switch to a tail recursive implementation afterwards.
stdlib/list.ml Outdated
match l with
[] -> []
| l1 :: r ->
let new_len = len + length l1 in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the length means that all lists will be traversed twice. You should keep track of the number of elements while traversing l1 during concatenation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: is counting the number of elements during the traversal necessarily going to be faster? There is going to be one traversal instead of two, but it will cost more at each step.

I guess that makes sense if the cost of following list pointers dominates the cost of other operations (because of the possible cache misses?)? Is that the correct intuition?

@alainfrisch
Copy link
Contributor

Thanks for the contribution. I agree it's a good idea to make at least the most common functions from List support long lists, even if this complicates the implementation a bit, and possibly even if this complicates some optimizations (e.g. a version of List.map with those "two modes" could be more difficult to deal with properly in flambda). It would be good to hear in particular from flambda people to know if they are ok (cc @chambart @lpw25 @mshinwell).

I'd have a preference for addressing all/most functions at once, but I'm not opposed to incremental improvements.

That said, such changes can impact performance significantly, and since lists are so pervasives in OCaml code, one needs to be careful. So we really want to see some results on micro-benchmarks (with various list lengths) when such changes are proposed.

@garrigue
Copy link
Contributor

garrigue commented Dec 13, 2018

Actually, the threshold seems wrong to me: the depth of the call stack is going to be roughly length l + length l1, not the total size of the concatenated lists.
Then why do that for flatten if we don't do that for append or map ? Just because append is called more often?
Wouldn't it be more coherent to introduce rev_flatten as a separate function, emphasizing that this one is tail-recursive, like for rev_map and rev_append?

@mookid
Copy link
Contributor Author

mookid commented Dec 13, 2018

Actually, the threshold seems wrong to me: the depth of the call stack is going to be roughly length l + length l1, not the total size of the concatenated lists.

Indeed, it is necessary to make append tail recursive as well, and to count the number of recursive calls instead of the number of elements. I did not realise that, as the code shows.

Then why do that for flatten if we don't do that for append or map ? Just because append is called more often?

Wouldn't it be more coherent to introduce rev_flatten as a separate function, emphasizing that this one is tail-recursive, like for rev_map and rev_append?

  • I am not sure it makes sense to expose that function. Are users likely to use rev_flatten? My intuition is that rev_map is sometimes used when users are interested in the set of computed values and don't care about the order. Of course I have nothing against exposing it.

  • The goal I would like to achieve is to remove as much as possible no tail recursive (or rather, stack unsafe) functions from List.ml. I would especially like to get away from the current situation where there is a potentially crashing function with a natural definition (map), and an unusual definition with safe semantics (rev_map).

I'd have a preference for addressing all/most functions at once, but I'm not opposed to incremental improvements.

I am not against porting all functions from List in this PR, if we agree on that.

@mookid
Copy link
Contributor Author

mookid commented Dec 13, 2018

Thanks for the contribution. I agree it's a good idea to make at least the most common functions from List support long lists, even if this complicates the implementation a bit, and possibly even if this complicates some optimizations (e.g. a version of List.map with those "two modes" could be more difficult to deal with properly in flambda). It would be good to hear in particular from flambda people to know if they are ok (cc @chambart @lpw25 @mshinwell).

Having minimal performance regression for small-medium input lists is of course something to aim for.

Is that realistic to fine tune the implementations for the different backends at once?

@gasche
Copy link
Member

gasche commented Dec 13, 2018

Is there a hope of writing a program transformation that turns the naive version into the painful-to-read-but-more-convenient-one, and use it through an explicit annotation on List programs?

A simple first iteration would be to detect exactly tail-recursion-modulo-cons patterns, that is functions that are tail-recursive except for a hd :: _ context applied around the recursive call.

Inputs are of the form

let rec f args =
  C[t1 :: f args1, ..., tN :: f argsN]
   [f args'1, ..., f args'M]
   [u1, ..., uO]

where C is a multi-hole context, where each hole is in tail-position and all expressions in tail-position are covered by a hole. (The uk are arbitrary and may contain non-tail-recursive calls to f).

Outputs would be of the form, I guess

let rec f args = f_nontail limit args
and f_nontail count args =
  if count = 0 then f_tail [] args
  else C[t1 :: f_nontail (count-1) args1, ..., tN :: f_nontail (count-1) argsN]
        [f_nontail count args'1, ..., f_nontail count args'M]
        [u1, ..., uO]
and f_tail acc args =
  C[f_tail (t1 :: acc) args1, ..., f_tail (tN :: acc) argsN]
   [f_tail acc args'1, ..., f_tail acc args'M]
   [List.rev_append acc u1, ..., List.rev_append acc u0]

This is a restricted form of the tail-recursion-modulo-constructor pattern implemented in #181 (cc @let-def), restricted to list functions, and without using any tricky runtime support to avoid the cost of the final argument reversal.

+ the application of append
+ the number of recursive calls
@let-def
Copy link
Contributor

let-def commented Dec 13, 2018

@gasche: Maybe we should reconsider #181? It has some restrictions, but it can be controlled by an attribute (opt-in only) and should solve this problem easily.

@garrigue
Copy link
Contributor

@let-def Indeed, #181 would be a nicer solution, if the performance is uniform enough.

In the meantime, I do think that rev_flatten is coherent with the rest of the library and perfectly makes sense. As for rev_map it is easy to recover the ordered behavior, and if you use it for non-determinism you may not care about the order.

Switching to dynamically protected version as default as suggested by this PR may be ok, but we would need benchmarks, and compare with #181.

[] -> []
| l1 :: r ->
append_aux 0 l1 (flatten_aux (i + 1) r)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you split max_recursion_threshold ?
It seems to me that you should rather call append_aux i l1 (flatten_aux (i + 1) r), since this is the same stack.

Of course the trouble with this whole approach is that to be extensible we would have to export max_recusion_threshold and the _aux version.

@gasche
Copy link
Member

gasche commented Dec 14, 2018

Yes, rev_flatten is fine, but the rest of the count-and-reverse logic for the non-rev functions adds a fair share of complexity to the codebase that we would rather do without.

@mookid
Copy link
Contributor Author

mookid commented Dec 14, 2018

@gasche: Maybe we should reconsider #181? It has some restrictions, but it can be controlled by an attribute (opt-in only) and should solve this problem easily.

The recursive call

let rec flatten = .....
 in l @ flatten r

in neither a tail call nor a tail call modulo constructor, or am I missing something?

@alainfrisch
Copy link
Contributor

alainfrisch commented Dec 14, 2018

If you write flatten as:

let rec flatten = function
  | [] -> []
  | l :: r -> flatten_aux l r

and flatten_aux l r =
  match l with
  | hd :: tl -> hd :: flatten_aux tl r
  | [] -> flatten r

then all calls are tail call or tail call modulo constructor.

@pmetzger
Copy link
Member

pmetzger commented Dec 15, 2018

then all calls are tail call or tail call modulo constructor

Isn't the "modulo cons" bit a problem? (Or does the OCaml compiler have a way to special case that? I know it is possible.)

@gasche
Copy link
Member

gasche commented Dec 15, 2018

We are precisely discussing adding an optimization of this kind to the compiler (as an alternative to writing ugly code by hand), to be enabled on an opt-in basis, directed by an annotation.

@pmetzger
Copy link
Member

Ah, I missed the point where the conversation turned because I didn't read #181

One question: I'm curious why the optimization listed in that PR shouldn't always be on, given the utility of tail recursion modulo cons/tail recursion modulo constructor optimization. It seems there's some detail of the effect of mutation on OCaml heap management that I don't quite understand. Regardless, having to annotate such things is a drag. Perhaps there's a rule that could automatically distinguish when to do the optimization and when not to?

@let-def
Copy link
Contributor

let-def commented Dec 17, 2018

I ported the TRMC code to trunk.

Early benchmarks are promising, but I would like to use Core_bench to remove some noise (results: difference between non-trmc and trmc version is noise for List.map on small lists, trmc gets faster for bigger lists, but in any case, unfolding the recursion a few times results in a major speedup).
I didn't manage to install a custom compiler with opam 2.0, if anybody can help :).

On trunk: https://github.com/let-def/ocaml/tree/trmc-2019
On 4.07.1: https://github.com/let-def/ocaml/tree/trmc-4.07.1

@ksromanov
Copy link
Contributor

ksromanov commented Mar 27, 2019

@let-def

I didn't manage to install a custom compiler with opam 2.0, if anybody can help :).

I rebased your branch on top of current trunk (March 27) and managed to get it in opam 2.0. It successfully compiled dune:

  1. Installed vanilla opam 2.0.2 from sources - https://github.com/ocaml/opam/releases/download/2.0.2/opam-full-2.0.2.tar.gz

  2. According to https://github.com/ocaml/opam/wiki/opam-2.0-FAQ#custom-ocaml-compiler
    (Ocaml repo is in $HOME/ocaml, branch trmc-2019 is checked out)

opam switch create ocaml-trmc --empty
opam pin add ocaml-variants.4.09.0+trmc ./ocaml/

opam will open editor and ask you to type in opam file - it should be https://gist.github.com/ksromanov/90f39094a7c0a5dbdd2abefe9ff32e6a

  1. Since there is no ocaml virtual package with version 4.09 yet, we need to add it
mkdir ~/ocaml.4.09.0

put opam file - it should be https://gist.github.com/ksromanov/b994adda95233ec0fb1e44d66cfe3f27

and add this virtual package to opam with

opam pin add ocaml.4.09.0 ./ocaml.4.09.0/

@mookid mookid closed this Aug 10, 2019
@pmetzger
Copy link
Member

@mookid Why did this get closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants