Poor compilation of dependent pattern-matching in bytecode #18933
Labels
kind: performance
Improvements to performance and efficiency.
kind: wish
Feature or enhancement requests.
part: VM
Virtual machine.
Is your feature request related to a problem?
A commonly encountered idiom in Coq is
The way it is compiled to bytecode is rather inefficient. First,
H
is pushed on the stack, then a continuation label is pushed too. Then, the pattern matching is performed. Every branch allocates a closure and fill it with the needed free variables. Then it pops the label and jumps to it. Finally, the closure is applied to the element that was pushed on the stack.To summarize, during evaluation, a closure is allocated and immediately consumed.
Proposed solution
Ideally, no closure would be allocated and applied, since the branches can be directly executed, which would bring a large speedup.
There is a snag with open terms, though. Indeed, during strong normalization, the branches need to actually return closures. In other words, for normalization, we still need the bytecode as it currently is, which means that a different bytecode would be executed during evaluation and during normalization. Moreover, no sharing would exist between both bytecodes, since the one for evaluation would access the free variables directly on the stack (
ACC
opcode), while the one for normalization would access the free variables from the closure environment (ENVACC
opcode). So, it would cause some bytecode duplication (and an exponential blowup in case of nested matches). It is not obvious if a well-thought prefix that would copy the closure environment on the stack would make it possible to delegate part of the normalization bytecode to the evaluation bytecode.Alternative solutions
No response
Additional context
This is remotely related to #14109.
The text was updated successfully, but these errors were encountered: