A tail recursive List.flatten #2199

mookid · 2018-12-12T23:17:36Z

List.flatten is not tail recursive, and is then subject to stack overflow.

A small illustrative test is the following:

let n = ref 0 in
while true do
  ignore(flatten (List.init !n (fun i -> List.init (i*i*i*i) (fun _ -> 0))));
  Printf.printf "%i: ok%!\n" !n;
  incr n
done;;

On the trunk version this stops after ~20 iterations of the loop.

The proposed version starts like the current non tail recursive implementation, and changes strategy when the result list gets big.

Use the non-tail recursive (and faster for small lists) implementation on small lists, and switch to a tail recursive implementation afterwards.

alainfrisch · 2018-12-13T09:18:27Z

stdlib/list.ml

+  match l with
+    [] -> []
+  | l1 :: r ->
+    let new_len = len + length l1 in


Computing the length means that all lists will be traversed twice. You should keep track of the number of elements while traversing l1 during concatenation.

Naive question: is counting the number of elements during the traversal necessarily going to be faster? There is going to be one traversal instead of two, but it will cost more at each step.

I guess that makes sense if the cost of following list pointers dominates the cost of other operations (because of the possible cache misses?)? Is that the correct intuition?

alainfrisch · 2018-12-13T09:27:29Z

Thanks for the contribution. I agree it's a good idea to make at least the most common functions from List support long lists, even if this complicates the implementation a bit, and possibly even if this complicates some optimizations (e.g. a version of List.map with those "two modes" could be more difficult to deal with properly in flambda). It would be good to hear in particular from flambda people to know if they are ok (cc @chambart @lpw25 @mshinwell).

I'd have a preference for addressing all/most functions at once, but I'm not opposed to incremental improvements.

That said, such changes can impact performance significantly, and since lists are so pervasives in OCaml code, one needs to be careful. So we really want to see some results on micro-benchmarks (with various list lengths) when such changes are proposed.

garrigue · 2018-12-13T09:56:16Z

Actually, the threshold seems wrong to me: the depth of the call stack is going to be roughly length l + length l1, not the total size of the concatenated lists.
Then why do that for flatten if we don't do that for append or map ? Just because append is called more often?
Wouldn't it be more coherent to introduce rev_flatten as a separate function, emphasizing that this one is tail-recursive, like for rev_map and rev_append?

mookid · 2018-12-13T17:43:12Z

Actually, the threshold seems wrong to me: the depth of the call stack is going to be roughly length l + length l1, not the total size of the concatenated lists.

Indeed, it is necessary to make append tail recursive as well, and to count the number of recursive calls instead of the number of elements. I did not realise that, as the code shows.

Then why do that for flatten if we don't do that for append or map ? Just because append is called more often?

Wouldn't it be more coherent to introduce rev_flatten as a separate function, emphasizing that this one is tail-recursive, like for rev_map and rev_append?

I am not sure it makes sense to expose that function. Are users likely to use rev_flatten? My intuition is that rev_map is sometimes used when users are interested in the set of computed values and don't care about the order. Of course I have nothing against exposing it.
The goal I would like to achieve is to remove as much as possible no tail recursive (or rather, stack unsafe) functions from List.ml. I would especially like to get away from the current situation where there is a potentially crashing function with a natural definition (map), and an unusual definition with safe semantics (rev_map).

I'd have a preference for addressing all/most functions at once, but I'm not opposed to incremental improvements.

I am not against porting all functions from List in this PR, if we agree on that.

mookid · 2018-12-13T17:45:33Z

Thanks for the contribution. I agree it's a good idea to make at least the most common functions from List support long lists, even if this complicates the implementation a bit, and possibly even if this complicates some optimizations (e.g. a version of List.map with those "two modes" could be more difficult to deal with properly in flambda). It would be good to hear in particular from flambda people to know if they are ok (cc @chambart @lpw25 @mshinwell).

Having minimal performance regression for small-medium input lists is of course something to aim for.

Is that realistic to fine tune the implementations for the different backends at once?

gasche · 2018-12-13T18:05:37Z

Is there a hope of writing a program transformation that turns the naive version into the painful-to-read-but-more-convenient-one, and use it through an explicit annotation on List programs?

A simple first iteration would be to detect exactly tail-recursion-modulo-cons patterns, that is functions that are tail-recursive except for a hd :: _ context applied around the recursive call.

Inputs are of the form

let rec f args =
  C[t1 :: f args1, ..., tN :: f argsN]
   [f args'1, ..., f args'M]
   [u1, ..., uO]

where C is a multi-hole context, where each hole is in tail-position and all expressions in tail-position are covered by a hole. (The uk are arbitrary and may contain non-tail-recursive calls to f).

Outputs would be of the form, I guess

let rec f args = f_nontail limit args
and f_nontail count args =
  if count = 0 then f_tail [] args
  else C[t1 :: f_nontail (count-1) args1, ..., tN :: f_nontail (count-1) argsN]
        [f_nontail count args'1, ..., f_nontail count args'M]
        [u1, ..., uO]
and f_tail acc args =
  C[f_tail (t1 :: acc) args1, ..., f_tail (tN :: acc) argsN]
   [f_tail acc args'1, ..., f_tail acc args'M]
   [List.rev_append acc u1, ..., List.rev_append acc u0]

This is a restricted form of the tail-recursion-modulo-constructor pattern implemented in #181 (cc @let-def), restricted to list functions, and without using any tricky runtime support to avoid the cost of the final argument reversal.

+ the application of append + the number of recursive calls

let-def · 2018-12-13T20:22:10Z

@gasche: Maybe we should reconsider #181? It has some restrictions, but it can be controlled by an attribute (opt-in only) and should solve this problem easily.

garrigue · 2018-12-14T00:07:46Z

@let-def Indeed, #181 would be a nicer solution, if the performance is uniform enough.

In the meantime, I do think that rev_flatten is coherent with the rest of the library and perfectly makes sense. As for rev_map it is easy to recover the ordered behavior, and if you use it for non-determinism you may not care about the order.

Switching to dynamically protected version as default as suggested by this PR may be ok, but we would need benchmarks, and compare with #181.

garrigue · 2018-12-14T00:13:18Z

stdlib/list.ml

+      [] -> []
+    | l1 :: r ->
+      append_aux 0 l1 (flatten_aux (i + 1) r)
+


Why do you split max_recursion_threshold ?
It seems to me that you should rather call append_aux i l1 (flatten_aux (i + 1) r), since this is the same stack.

Of course the trouble with this whole approach is that to be extensible we would have to export max_recusion_threshold and the _aux version.

gasche · 2018-12-14T04:20:11Z

Yes, rev_flatten is fine, but the rest of the count-and-reverse logic for the non-rev functions adds a fair share of complexity to the codebase that we would rather do without.

mookid · 2018-12-14T13:11:07Z

@gasche: Maybe we should reconsider #181? It has some restrictions, but it can be controlled by an attribute (opt-in only) and should solve this problem easily.

The recursive call

let rec flatten = .....
 in l @ flatten r

in neither a tail call nor a tail call modulo constructor, or am I missing something?

alainfrisch · 2018-12-14T14:09:31Z

If you write flatten as:

let rec flatten = function
  | [] -> []
  | l :: r -> flatten_aux l r

and flatten_aux l r =
  match l with
  | hd :: tl -> hd :: flatten_aux tl r
  | [] -> flatten r

then all calls are tail call or tail call modulo constructor.

pmetzger · 2018-12-15T01:21:56Z

then all calls are tail call or tail call modulo constructor

Isn't the "modulo cons" bit a problem? (Or does the OCaml compiler have a way to special case that? I know it is possible.)

gasche · 2018-12-15T12:45:06Z

We are precisely discussing adding an optimization of this kind to the compiler (as an alternative to writing ugly code by hand), to be enabled on an opt-in basis, directed by an annotation.

pmetzger · 2018-12-15T12:57:03Z

Ah, I missed the point where the conversation turned because I didn't read #181

One question: I'm curious why the optimization listed in that PR shouldn't always be on, given the utility of tail recursion modulo cons/tail recursion modulo constructor optimization. It seems there's some detail of the effect of mutation on OCaml heap management that I don't quite understand. Regardless, having to annotate such things is a drag. Perhaps there's a rule that could automatically distinguish when to do the optimization and when not to?

let-def · 2018-12-17T10:32:06Z

I ported the TRMC code to trunk.

Early benchmarks are promising, but I would like to use Core_bench to remove some noise (results: difference between non-trmc and trmc version is noise for List.map on small lists, trmc gets faster for bigger lists, but in any case, unfolding the recursion a few times results in a major speedup).
I didn't manage to install a custom compiler with opam 2.0, if anybody can help :).

On trunk: https://github.com/let-def/ocaml/tree/trmc-2019
On 4.07.1: https://github.com/let-def/ocaml/tree/trmc-4.07.1

ksromanov · 2019-03-27T21:29:06Z

@let-def

I didn't manage to install a custom compiler with opam 2.0, if anybody can help :).

I rebased your branch on top of current trunk (March 27) and managed to get it in opam 2.0. It successfully compiled dune:

Installed vanilla opam 2.0.2 from sources - https://github.com/ocaml/opam/releases/download/2.0.2/opam-full-2.0.2.tar.gz
According to https://github.com/ocaml/opam/wiki/opam-2.0-FAQ#custom-ocaml-compiler
(Ocaml repo is in $HOME/ocaml, branch trmc-2019 is checked out)

opam switch create ocaml-trmc --empty
opam pin add ocaml-variants.4.09.0+trmc ./ocaml/

opam will open editor and ask you to type in opam file - it should be https://gist.github.com/ksromanov/90f39094a7c0a5dbdd2abefe9ff32e6a

Since there is no ocaml virtual package with version 4.09 yet, we need to add it

mkdir ~/ocaml.4.09.0

put opam file - it should be https://gist.github.com/ksromanov/b994adda95233ec0fb1e44d66cfe3f27

and add this virtual package to opam with

opam pin add ocaml.4.09.0 ./ocaml.4.09.0/

pmetzger · 2019-08-11T17:32:01Z

@mookid Why did this get closed?

mookid added 2 commits December 12, 2018 22:19

List.ml: (rev_init_threshold): rename to max_recursion_threshold.

1646bb3

List.flatten/concat: a new version in constant stack space.

bab5900

Use the non-tail recursive (and faster for small lists) implementation on small lists, and switch to a tail recursive implementation afterwards.

alainfrisch reviewed Dec 13, 2018

View reviewed changes

Protect against overflow for

cd15849

+ the application of append + the number of recursive calls

garrigue reviewed Dec 14, 2018

View reviewed changes

Improve stack usage.

f6a95a0

gasche mentioned this pull request Dec 28, 2018

RFC: tail recursion modulo constructors #181

Closed

mookid closed this Aug 10, 2019

nojb mentioned this pull request May 2, 2021

List.append is not tail recursive #10388

Closed

mjambon mentioned this pull request Jul 20, 2021

Make List functions safe by default and optionally offer faster versions #10528

Closed

nojb mentioned this pull request Jun 26, 2022

Make List.{map,mapi,map2} TRMC #11362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A tail recursive List.flatten #2199

A tail recursive List.flatten #2199

mookid commented Dec 12, 2018

alainfrisch Dec 13, 2018

Armael Dec 13, 2018

alainfrisch commented Dec 13, 2018

garrigue commented Dec 13, 2018 •

edited

mookid commented Dec 13, 2018

mookid commented Dec 13, 2018

gasche commented Dec 13, 2018 •

edited

let-def commented Dec 13, 2018 •

edited

garrigue commented Dec 14, 2018

garrigue Dec 14, 2018

gasche commented Dec 14, 2018

mookid commented Dec 14, 2018

alainfrisch commented Dec 14, 2018 •

edited

pmetzger commented Dec 15, 2018 •

edited

gasche commented Dec 15, 2018

pmetzger commented Dec 15, 2018

let-def commented Dec 17, 2018

ksromanov commented Mar 27, 2019 •

edited

pmetzger commented Aug 11, 2019

A tail recursive List.flatten #2199

A tail recursive List.flatten #2199

Conversation

mookid commented Dec 12, 2018

alainfrisch Dec 13, 2018

Choose a reason for hiding this comment

Armael Dec 13, 2018

Choose a reason for hiding this comment

alainfrisch commented Dec 13, 2018

garrigue commented Dec 13, 2018 • edited

mookid commented Dec 13, 2018

mookid commented Dec 13, 2018

gasche commented Dec 13, 2018 • edited

let-def commented Dec 13, 2018 • edited

garrigue commented Dec 14, 2018

garrigue Dec 14, 2018

Choose a reason for hiding this comment

gasche commented Dec 14, 2018

mookid commented Dec 14, 2018

alainfrisch commented Dec 14, 2018 • edited

pmetzger commented Dec 15, 2018 • edited

gasche commented Dec 15, 2018

pmetzger commented Dec 15, 2018

let-def commented Dec 17, 2018

ksromanov commented Mar 27, 2019 • edited

pmetzger commented Aug 11, 2019

garrigue commented Dec 13, 2018 •

edited

gasche commented Dec 13, 2018 •

edited

let-def commented Dec 13, 2018 •

edited

alainfrisch commented Dec 14, 2018 •

edited

pmetzger commented Dec 15, 2018 •

edited

ksromanov commented Mar 27, 2019 •

edited