Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Take more opportunities to trigger TCO #3967

Open
radrow opened this issue Dec 14, 2020 · 19 comments · May be fixed by #3968
Open

Proposal: Take more opportunities to trigger TCO #3967

radrow opened this issue Dec 14, 2020 · 19 comments · May be fixed by #3968

Comments

@radrow
Copy link
Contributor

radrow commented Dec 14, 2020

Summary

You don't need every self reference to be in tail position to trigger TCO. Why not make use of this beautiful optimization whenever it is possible?

Motivation

data Pokemon
  = Charizard {lvl : Int}
  | Exeggcute {eggz : [Pokemon]}

No TCO:

levelupAll :: [Pokemon] -> [Pokemon] -> [Pokemon]
levelupAll ps acc = case ps of
  [] -> reverse acc
  (Charizard spec@{lvl: lvl} : rest) ->
    levelupAll rest ((Charizard spec{lvl: lvl + 1}):acc)
  (Exeggcute spec@{eggz: eggz} : rest) ->
    levelupAll rest ((Exeggcute spec{eggz: levelupAll eggz}):acc)

TCO:

levelupAll :: [Pokemon] -> [Pokemon] -> [Pokemon]
levelupAll ps acc = case ps of
  [] -> reverse acc
  (Charizard spec@{lvl: lvl} : rest) ->
    levelupAll rest ((Charizard spec{lvl: lvl + 1}):acc)
  (Exeggcute spec@{eggz: eggz} : rest) ->
    levelupAll rest ((Exeggcute spec{eggz: levelupAllHereIExplicitlyBreakTheTCO eggz}):acc)

levelupAllHereIExplicitlyBreakTheTCO ps = levelupAll ps

(modulo some syntax).

The current implementation disables TCO whenever there is a non-tail reference to the current function – not necessary in call, can be also passing it as an argument to some other function or value definition. This is ridiculous and requires presented above workarounds.

I don't know Haskell's policy about it at all, but in JS we suffer strongly the stack limit so loopifying calls can be crucial in many cases – for instance in our
erlscripten project where we need to implement the mentioned trick in order to be able to compare long lists of some erlang terms: example

Proposal

Trigger TCO everywhere it is possible regardless of if there are some other non-tail calls. I have already prepared a PR for this.

Examples

All of the following functions should be stack safe:

occasionalTCO1 :: Int -> Int
occasionalTCO1 0 = 1
occasionalTCO1 n =
  occasionalTCO1 (n - occasionalTCO1 0)

occasionalTCO2 :: Int -> Int
occasionalTCO2 0 = 1
occasionalTCO2 n =
  let x = occasionalTCO2 0
  in occasionalTCO2 (n - x)

occasionalTCO3 :: Int -> Int
occasionalTCO3 0 = 1
occasionalTCO3 n =
  if occasionalTCO3 0 == n
  then 1
  else occasionalTCO3 (n - occasionalTCO3 0)

occasionalTCO4 :: Int -> Int
occasionalTCO4 0 = 1
occasionalTCO4 n | 1 <- occasionalTCO4 0 =
  case occasionalTCO4 0 + n of
    2 -> 1
    x -> occasionalTCO4 (x - 2)
occasionalTCO4 _ = 1

occasionalTCO5 :: Int -> Int
occasionalTCO5 0 = 1
occasionalTCO5 n | n > 10 =
  occasionalTCO5 (n - 1)
occasionalTCO5 n =
  if n > 5
  then occasionalTCO5 $ n - 1
  else call occasionalTCO5 (n - 1)
@rhendric
Copy link
Member

The only reason I can think of to oppose this is that TCO introduces a little bit of overhead for each un-transformed invocation of the transformed function. So for a function that calls itself mostly in non-tail positions, but has one infrequently executed case of calling itself in tail position, applying TCO as we've implemented it could be a substantial deoptimization.

Of course, some extra per-call overhead, even for a heavily recursive function, may be much more tolerable than a function that can't run at all due to stack constraints. So I don't know if this is a particularly strong counterargument to your proposal. I do think that we should maybe take some time to consider if there's a more conservative position to stake out in between ‘every recursive call is in tail position’ and ‘any recursive call is in tail position’.

Remember also that you can always explicitly get tail recursion with something like purescript-tailrec, when overhead isn't a concern and it won't run at all otherwise.

@radrow
Copy link
Contributor Author

radrow commented Dec 15, 2020

I see that this little overhead may be undesirable. However, in my opinion, the cost should be acceptable. The lack of TCO can break the program, a bunch of additional ifs cannot.

The other idea that comes to my mind is to use "all" rule by default and trigger "any" if the user provides some annotation ({-# TAILREC fun #-}, {-# TAILCALL #-} fun (acc - 1)). This would make the code a bit uglier, but would be more explicit and could save some surprises.

@rhendric
Copy link
Member

The core team has generally been against controlling optimizations with pragmas; see for example discussion on #2345. So that's unlikely to be the way forward.

However, I was thinking that maybe we can get a similar effect by having the optimizer target specific tail-recursion-indicating functions, maybe even tailRec in purescript-tailrec, for wholesale rewriting when possible, similarly to how it rewrites some uses of the members of Monad Effect and friends. That would make it possible to explicitly request tail recursion using existing language constructs, and the semantics would be unchanged between the unoptimized and optimized versions, including how large the numbers can get. There's probably some design and negotiation that would need to happen with that, given how non-central purescript-tailrec is as a library (I think?).

@rhendric
Copy link
Member

@hdgarrood, I'm convinced that this proposal is at least safe on an abstract, unlimited stack machine.

The argument (credit to @radrow's motivation section for the core idea) is that with the current optimizer, any function f with at least one tail call can trivially be converted into a form that will get TCO'd by replacing every non-tail use of f with f', where f' x = f x is defined somewhere outside of the definition of f. Assuming that TCO is currently safe, this substitution doesn't make it unsafe (provided that adding function calls to the stack is not a concern). Re-substituting f for f' inside the function after it's been TCO'd is also safe; even in imperative JavaScript, substituting one locally-declared and constant identifier for another should be referentially transparent. The proposal of allowing TCO on functions with some non-tail self uses is equivalent to taking those three safe steps (f -> f', existing TCO, f' -> f) in order.


That said, being safe is a necessary but not sufficient condition for endorsing the proposal. I'm still concerned that TCO can be a deoptimization in some circumstances. Consider, as a dumb example,

slowMult :: Int -> Int -> Int
slowMult a b = case b of
  1 -> a
  2 -> slowMult (a + a) 1
  _ -> a + slowMult a (b - 1)

With current PS, this function is not TCO'd, and slowMult 5 5000 correctly returns 25000. If TCO were to be allowed on this function, the compiled result would look like

var slowMult = function ($copy_a) {
    return function ($copy_b) {
        var $tco_var_a = $copy_a;
        var $tco_done = false;
        var $tco_result;
        function $tco_loop(a, b) {
            if (b === 1) {
                $tco_done = true;
                return a;
            };
            if (b === 2) {
                $tco_var_a = a + a | 0;
                $copy_b = 1;
                return;
            };
            $tco_done = true;
            return a + slowMult(a)(b - 1 | 0) | 0;
        };
        while (!$tco_done) {
            $tco_result = $tco_loop($tco_var_a, $copy_b);
        };
        return $tco_result;
    };
};

The problem is that the extra $tco_loop function in there effectively doubles the amount of stack needed relative to the non-TCO'd version. So slowMult 5 5000 breaks the stack with this implementation.

@natefaubion
Copy link
Contributor

FWIW, this is pretty straightforward to work around.

example = go where
  go ... = go ...
  go ... = 1 + example ...

I've done this in purescript-run. It is unfortunate that there's no way to get the compiler to yell at you, but I think this would be solved by optimizing tailrec.

@natefaubion
Copy link
Contributor

My opinion on implicit TCO (in this compiler) is that the less of it, the better. It's unfortunately somewhere between an optimization and semantics. That is, if a compiler does not implement it, it may result in existing programs failing to run. Additionally, if a compiler/optimizer implements additional optimizations to make it more permissive (say local, mutually recursive groups), then it's likely that users will write libraries that only work given the more permissive optimization. This is why Clojure has an explicit loop/recur construct. Optimizing tailrec specifically is somewhat dodgy as well (like making Boolean operators lazy), but the typed interface makes it easier to specify. I'm not necessarily opposed to this proposal, but I would like for us to consider what the specification for TCO is, and whether or not this makes the specification simpler or more complex.

@rhendric
Copy link
Member

One benefit of targeting tailrec would be if we can split the current optimization into an implicit-to-tailrec pass that can run on CoreFn, and a tailrec-to-loop pass that runs on CoreImp. If that works, then the CoreFn pass could be made more permissive on a user-by-user basis via external CoreFn transformations.

@natefaubion
Copy link
Contributor

All backends operate on CoreFn, so I think that if we wanted consistent semantics for TCO, we would need to add a loop/recur constructor to CoreFn. Right now, all backends have to implement their own TCO optimization, so I'm hesitant to make it more permissive.

@rhendric
Copy link
Member

Is there an advantage to extending CoreFn with loop/recur as opposed to using tailRec/Loop/Done? With a loop/recur construct, backends would have to change to support it; with tailRec/Loop/Done, a backend always has the option of doing nothing and the runtime implementation of tailRec will do the right thing (modulo stack size constraints).

@natefaubion
Copy link
Contributor

natefaubion commented Dec 16, 2020

PureScript doesn't have a (specified) runtime, so I'm not sure how you'd turn implicit TCO into tailRec calls without it being in scope somehow.

@rhendric
Copy link
Member

... right, yeah, a backend wouldn't be able to literally do nothing if tailRec isn't in scope. Silly of me. I guess every other instance of PureScript surface syntax desugaring to call some library function in CoreFn is triggered either by something else in that library or by some language token where the compiler can reasonably complain if the function isn't in scope, whereas it's a bit weirder to do that if you've written a tail-recursive function.

So it's a wash then? Extend CoreFn with more primitives or effectively elevate certain functions/constructors to primitive status as far as backends are concerned?

@radrow
Copy link
Contributor Author

radrow commented Dec 17, 2020

That said, being safe is a necessary but not sufficient condition for endorsing the proposal. I'm still concerned that TCO can be a deoptimization in some circumstances. Consider, as a dumb example,

That example is quite convincing, thank you for it. Nevertheless, while I get that forcing TCO all the way may be undesirable, I am pretty convinced that self referencing without a call should not break the TCO (like in the initial Pokemon example). At this moment I really don't see any case where it could cause some problems – especially considering the fact that the function would be actually 100% tail recursive.

Speaking about the tailrec approach; since (as I am observing) PS's philosophy prevents implicitly including any libraries or internal functions into the scope, using tailRec is not the option to go, imo. The idea with CoreFn primitives sound much better to me as it is kind of trivial to mock this construct with internal tailRec definition, so it won't be much issue for the backends to pass it by or implement it properly. Also, I don't see much point of forcing explicitness by adding primitives to the frontend, as it could break a lot of existing projects relying on implicit TCO. And if somebody really wants it, the tailRec is happy to shine.

@rhendric
Copy link
Member

self referencing without a call should not break the TCO (like in the initial Pokemon example).

Sorry, I'm not following. Where does the Pokemon example demonstrate self-referencing without a call? All the uses of levelupAll I see are calls.

For the rest of it, I think @natefaubion hit the nail on the head: in addition to the question of whether this is a good optimization for the JS backend, this is also a design decision about what our spec for TCO is, in terms of what program authors can rely upon and what backend authors are required to support, and possibly also what CoreFn transformers are able to do about it.

I'd say we're currently operating as if the spec is that functions written such that every self-reference is a tail call are guaranteed to compile to something that is not intrinsically limited to some finite depth of recursion (and whether that guarantee requires an explicit TCO transformation is a backend-specific concern). Expanding that guarantee places a burden on other backends, and expanding TCO support in the JS backend beyond that guarantee may be indistinguishable from expanding the guarantee in practice, if not formally. (I don't think that's a universal argument against doing nice things in the JS optimizer, to be clear. The case of tail-call optimization is special because it can change the set of inputs a program can accept, not just the speed at which it runs.)

So either we need to expand that guarantee with due consideration, which requires a conversation for which the core team won't have the capacity until after 0.14 is released at least, or we need to explore improvements which don't expand that guarantee, which is why a tailRec-based approach or similar looked appealing to me. (Or, third option, we come to an agreement that things are good enough as they are, given the full consequences of doing things differently and the availability of workarounds.)

@natefaubion
Copy link
Contributor

Another thought, if you were to consider adding a loop/recur primitive (where recur must always appear in tail position for the current loop block), then what would the case in this proposal look like? I think it would essentially be exactly what's in #3967 (comment). That is, the TCO spec we've implemented is effectively loop/recur, which I think is about as simple as it gets. Without a proper looping primitive in CoreFn, I'm afraid I'm not very keen on this change. I think it's fine if the surface optimizer can turn more forms into a core loop primitive, but since all backends must currently maintain this implementation themselves, I think we should keep it as is.

@radrow
Copy link
Contributor Author

radrow commented Dec 17, 2020

Where does the Pokemon example demonstrate self-referencing without a call?

Ah, sorry, this is not the case there. I thought I was mapping over the list with self reference.


The problem which I have with the current implementation is that I found the semantics surprising and misleading. Perhaps I missed something in the documentation, but I implicitly assumed that it works same as in Haskell where

let {f x | x `mod` 2 == 0 = 2; f x = f (f (x - 1) + x)} in f 1

runs happily in a constant memory. I was even thinking about getting rid of all of the implicit tail recursion in order to make it clear where it applies (with recur/loop primitives on the language level). This would be a breaking change, tho.

Speaking about the specification, for me the best idea is to add primitives in the core and let the backends decide what to do with it and adjust to their target – keeping constraint on the stack usage. The semantics would be clear and the optimization would have its defined place. TCO is crucial for pure FP programs, so I would really go for treating it specially.

I'd also like to hear your opinion on the self reference without a call case. Also, it seems reasonable to accept self calls in the arguments to the legit tail calls (as in haskell example above), because it won't sneak that easily as in @rhendric example

@natefaubion
Copy link
Contributor

I would also be interested in adding a CoreFn primitive like join points, since it's more general than just loops (eg, pattern matching optimizations). But it may be trickier for backends to implement this.

@rhendric
Copy link
Member

I implicitly assumed that it works same as in Haskell

Haskell and PureScript have fundamentally different evaluation strategies, and that's the very first item on the Differences from Haskell doc, so that's not a terribly reasonable assumption IMO. I'm sure I don't need to provide other examples of programs that will happily run in Haskell but crash in PureScript due to their different evaluation strategies.

I'd also like to hear your opinion on the self reference without a call case.

The class of problems demonstrated by slowMult covers any program in which an outer function (that is otherwise tail-recursive) is called during its own evaluation. Whether that call appears lexically inside the function definition is immaterial; I can replace a + slowMult a (b - 1) with a + call slowMult a (b - 1) and still have a program that succeeds on more inputs without TCO than with it. And of course it doesn't matter if it's call or map or any other function receiving the reference, if that function could directly or indirectly call the reference. In principle, I'd be less concerned about a self-reference that is known not to be invoked during the function's lifetime (say, is returned as part of a data structure or a closure), but:

  1. Is that a common enough use case to be of value to you?
  2. What heuristic would you propose to identify such cases, given that we ought not to expect the optimizer to solve the halting problem?
  3. @natefaubion's question about whether this makes the specification simpler or more complex becomes extremely relevant.

Re CoreFn, I fear that's become a bit of a distraction. The proposed benefit was that advanced users could rewrite CoreFn expressions to use TCO in more places than the default, and get a clean surface syntax, a permissive TCO policy, and a compiled result without any extra overhead (above what TCO adds) all at once without changing anything too fundamental about how TCO works for everyone else. The thing is, if you're willing to go to the trouble of writing a CoreFn rewriter, right now you can get the first two of those things by rewriting CoreFn to use any of the available workarounds (your TCOBreak pattern, @natefaubion's similar but more idiomatic outer = go where go ... pattern, or an explicit tailRec function that you write yourself or depend on purescript-tailrec for). The only thing a new primitive would get you is making the end result have less overhead, but even without that, we aren't talking ‘blow up the stack’ overhead, just performance. And if you aren't going to undertake rewriting CoreFn, then no primitive at that level will help your situation without some other change.

@natefaubion
Copy link
Contributor

Re CoreFn, I fear that's become a bit of a distraction.

Right, this isn't really the best place for that sort of discussion. I only brought it up, because that would make me more willing to accept different or more permissive TCO strategies, since it can be done independent of various backends. Absent that, I'm not in favor of changing the TCO strategy.

@gorbak25
Copy link

gorbak25 commented Dec 18, 2020

or endorsing the proposal. I'm still concerned that TCO can be a deoptimization in some circumstances. Consider, as a dumb example,

This argument only applies until #3183 gets fixed - in many cases we may skip an additional function call. If we would never emit an additional function then it's beneficial to trigger TCO as often as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants