New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Take more opportunities to trigger TCO #3967
Comments
The only reason I can think of to oppose this is that TCO introduces a little bit of overhead for each un-transformed invocation of the transformed function. So for a function that calls itself mostly in non-tail positions, but has one infrequently executed case of calling itself in tail position, applying TCO as we've implemented it could be a substantial deoptimization. Of course, some extra per-call overhead, even for a heavily recursive function, may be much more tolerable than a function that can't run at all due to stack constraints. So I don't know if this is a particularly strong counterargument to your proposal. I do think that we should maybe take some time to consider if there's a more conservative position to stake out in between ‘every recursive call is in tail position’ and ‘any recursive call is in tail position’. Remember also that you can always explicitly get tail recursion with something like |
I see that this little overhead may be undesirable. However, in my opinion, the cost should be acceptable. The lack of TCO can break the program, a bunch of additional ifs cannot. The other idea that comes to my mind is to use "all" rule by default and trigger "any" if the user provides some annotation ( |
The core team has generally been against controlling optimizations with pragmas; see for example discussion on #2345. So that's unlikely to be the way forward. However, I was thinking that maybe we can get a similar effect by having the optimizer target specific tail-recursion-indicating functions, maybe even |
@hdgarrood, I'm convinced that this proposal is at least safe on an abstract, unlimited stack machine. The argument (credit to @radrow's motivation section for the core idea) is that with the current optimizer, any function That said, being safe is a necessary but not sufficient condition for endorsing the proposal. I'm still concerned that TCO can be a deoptimization in some circumstances. Consider, as a dumb example, slowMult :: Int -> Int -> Int
slowMult a b = case b of
1 -> a
2 -> slowMult (a + a) 1
_ -> a + slowMult a (b - 1) With current PS, this function is not TCO'd, and var slowMult = function ($copy_a) {
return function ($copy_b) {
var $tco_var_a = $copy_a;
var $tco_done = false;
var $tco_result;
function $tco_loop(a, b) {
if (b === 1) {
$tco_done = true;
return a;
};
if (b === 2) {
$tco_var_a = a + a | 0;
$copy_b = 1;
return;
};
$tco_done = true;
return a + slowMult(a)(b - 1 | 0) | 0;
};
while (!$tco_done) {
$tco_result = $tco_loop($tco_var_a, $copy_b);
};
return $tco_result;
};
}; The problem is that the extra |
FWIW, this is pretty straightforward to work around.
I've done this in |
My opinion on implicit TCO (in this compiler) is that the less of it, the better. It's unfortunately somewhere between an optimization and semantics. That is, if a compiler does not implement it, it may result in existing programs failing to run. Additionally, if a compiler/optimizer implements additional optimizations to make it more permissive (say local, mutually recursive groups), then it's likely that users will write libraries that only work given the more permissive optimization. This is why Clojure has an explicit loop/recur construct. Optimizing |
One benefit of targeting |
All backends operate on CoreFn, so I think that if we wanted consistent semantics for TCO, we would need to add a loop/recur constructor to CoreFn. Right now, all backends have to implement their own TCO optimization, so I'm hesitant to make it more permissive. |
Is there an advantage to extending CoreFn with loop/recur as opposed to using tailRec/Loop/Done? With a loop/recur construct, backends would have to change to support it; with tailRec/Loop/Done, a backend always has the option of doing nothing and the runtime implementation of |
PureScript doesn't have a (specified) runtime, so I'm not sure how you'd turn implicit TCO into tailRec calls without it being in scope somehow. |
... right, yeah, a backend wouldn't be able to literally do nothing if tailRec isn't in scope. Silly of me. I guess every other instance of PureScript surface syntax desugaring to call some library function in CoreFn is triggered either by something else in that library or by some language token where the compiler can reasonably complain if the function isn't in scope, whereas it's a bit weirder to do that if you've written a tail-recursive function. So it's a wash then? Extend CoreFn with more primitives or effectively elevate certain functions/constructors to primitive status as far as backends are concerned? |
That example is quite convincing, thank you for it. Nevertheless, while I get that forcing TCO all the way may be undesirable, I am pretty convinced that self referencing without a call should not break the TCO (like in the initial Pokemon example). At this moment I really don't see any case where it could cause some problems – especially considering the fact that the function would be actually 100% tail recursive. Speaking about the |
Sorry, I'm not following. Where does the Pokemon example demonstrate self-referencing without a call? All the uses of For the rest of it, I think @natefaubion hit the nail on the head: in addition to the question of whether this is a good optimization for the JS backend, this is also a design decision about what our spec for TCO is, in terms of what program authors can rely upon and what backend authors are required to support, and possibly also what CoreFn transformers are able to do about it. I'd say we're currently operating as if the spec is that functions written such that every self-reference is a tail call are guaranteed to compile to something that is not intrinsically limited to some finite depth of recursion (and whether that guarantee requires an explicit TCO transformation is a backend-specific concern). Expanding that guarantee places a burden on other backends, and expanding TCO support in the JS backend beyond that guarantee may be indistinguishable from expanding the guarantee in practice, if not formally. (I don't think that's a universal argument against doing nice things in the JS optimizer, to be clear. The case of tail-call optimization is special because it can change the set of inputs a program can accept, not just the speed at which it runs.) So either we need to expand that guarantee with due consideration, which requires a conversation for which the core team won't have the capacity until after 0.14 is released at least, or we need to explore improvements which don't expand that guarantee, which is why a |
Another thought, if you were to consider adding a loop/recur primitive (where recur must always appear in tail position for the current loop block), then what would the case in this proposal look like? I think it would essentially be exactly what's in #3967 (comment). That is, the TCO spec we've implemented is effectively loop/recur, which I think is about as simple as it gets. Without a proper looping primitive in CoreFn, I'm afraid I'm not very keen on this change. I think it's fine if the surface optimizer can turn more forms into a core loop primitive, but since all backends must currently maintain this implementation themselves, I think we should keep it as is. |
Ah, sorry, this is not the case there. I thought I was mapping over the list with self reference. The problem which I have with the current implementation is that I found the semantics surprising and misleading. Perhaps I missed something in the documentation, but I implicitly assumed that it works same as in Haskell where
runs happily in a constant memory. I was even thinking about getting rid of all of the implicit tail recursion in order to make it clear where it applies (with Speaking about the specification, for me the best idea is to add primitives in the core and let the backends decide what to do with it and adjust to their target – keeping constraint on the stack usage. The semantics would be clear and the optimization would have its defined place. TCO is crucial for pure FP programs, so I would really go for treating it specially. I'd also like to hear your opinion on the self reference without a call case. Also, it seems reasonable to accept self calls in the arguments to the legit tail calls (as in haskell example above), because it won't sneak that easily as in @rhendric example |
I would also be interested in adding a CoreFn primitive like join points, since it's more general than just loops (eg, pattern matching optimizations). But it may be trickier for backends to implement this. |
Haskell and PureScript have fundamentally different evaluation strategies, and that's the very first item on the Differences from Haskell doc, so that's not a terribly reasonable assumption IMO. I'm sure I don't need to provide other examples of programs that will happily run in Haskell but crash in PureScript due to their different evaluation strategies.
The class of problems demonstrated by
Re CoreFn, I fear that's become a bit of a distraction. The proposed benefit was that advanced users could rewrite CoreFn expressions to use TCO in more places than the default, and get a clean surface syntax, a permissive TCO policy, and a compiled result without any extra overhead (above what TCO adds) all at once without changing anything too fundamental about how TCO works for everyone else. The thing is, if you're willing to go to the trouble of writing a CoreFn rewriter, right now you can get the first two of those things by rewriting CoreFn to use any of the available workarounds (your |
Right, this isn't really the best place for that sort of discussion. I only brought it up, because that would make me more willing to accept different or more permissive TCO strategies, since it can be done independent of various backends. Absent that, I'm not in favor of changing the TCO strategy. |
This argument only applies until #3183 gets fixed - in many cases we may skip an additional function call. If we would never emit an additional function then it's beneficial to trigger TCO as often as possible. |
Summary
You don't need every self reference to be in tail position to trigger TCO. Why not make use of this beautiful optimization whenever it is possible?
Motivation
No TCO:
TCO:
(modulo some syntax).
The current implementation disables TCO whenever there is a non-tail reference to the current function – not necessary in call, can be also passing it as an argument to some other function or value definition. This is ridiculous and requires presented above workarounds.
I don't know Haskell's policy about it at all, but in JS we suffer strongly the stack limit so loopifying calls can be crucial in many cases – for instance in our
erlscripten project where we need to implement the mentioned trick in order to be able to compare long lists of some erlang terms: example
Proposal
Trigger TCO everywhere it is possible regardless of if there are some other non-tail calls. I have already prepared a PR for this.
Examples
All of the following functions should be stack safe:
The text was updated successfully, but these errors were encountered: