-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifications to the native compilation of exception handlers #1482
Conversation
@@ -171,6 +171,13 @@ let emit_Llabel fallthrough lbl = | |||
if not fallthrough && !fastcode_flag then D.align 4; | |||
def_label lbl | |||
|
|||
let load_label_addr s arg = | |||
(* CR mshinwell: this needs more testing *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this patch supposed to still have CRs in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be checked on the various targets (and target variants) to make sure that the correct instruction sequence is being generated.
I'm in favor of changing the linearization of try...with constructs so that no call instruction is used and the body is in fallthrough-unconditional branch position. I'm absolutely not convinced that the current, structured representation of try...with should be replaced by an unstructured representation based on push-handler and pop-handler. You're losing structure, which is rarely if ever a good thing, and you're not gaining anything as far as I can see. |
The restrictions of the try-with structure can be shown of the following example: let x =
if (condition) then
begin (before1); 1 end
else begin (before2); 2 end
in
try
if x = 1 then (after1) else (after2)
with exn -> (handler) The version of flambda that is being worked on can propagate the information on x and rewrite this into:
(assuming x is not used anywhere else). However, we acknowledge that the use of explicit pushtrap and poptrap instructions is not ideal in otherwise structured intermediate representations. try (body) with exn -> (handler) becomes (with
With the patch that puts body before handler in linearized code, the extra jumps are actually fallthrough and can be removed, so no overhead remains in the assembly. Would such a solution be more acceptable to you ? |
2e26e84
to
61b366b
Compare
61b366b
to
99efa61
Compare
I've pushed an updated version that roughly implements the proposal in my previous comment.
The first choice is a simple matter of convenience: it means less modifications to the types, and doesn't prevent checking consistency earlier if one wishes so. A few comments on the commits themselves:
We're still not planning to submit the two initial changes independently, but otherwise every commit except the main one (and Changes, I guess) can be dropped if needed. |
Benchmarks are finally up on the flambda benchmarks site here. You can find a specific comparison of this pull request with the trunk commit it's based on here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in amd64,i386,arm64,arm look fine to me. I don't know enough about the other targets to be confident.
I wonder about the choice of temporary registers for use in the code emitted for Lraise in amd64 (r11) and i386 (ebx). Is there any particular reason for choosing these registers specifically, and if so - perhaps it should be documented.
A few years later, what is the status of this PR? Are we still interested in going in this direction? (From a distance this sounds interesting, but I can't tell if the new design addresses the criticisms of Xavier on the first proposal.) Personally I find the discussion hard to read as there was, if I understand correctly, a big change to the design mid-fly. I would be in favor of closing this PR, and asking @lthls to reopen a new PR that presents the final design directly. |
The final design is in use on the Flambda 2 branch. I also have a branch that should contain only the patches related to the compilation of exceptions, but unless somebody outside the Flambda team is interested in reviewing it now it will get re-submitted when we start upstreaming the relevant Flambda 2 patches. |
Overview
This pull request is extracted from the ongoing flambda branch, and contains two main features:
performance in the exception case
Removal of Trywith constructs
This pull request replaces the
Utrywith
construct from Clambda, and replaces it with a more generalizedUcatch
construct, andUpushtrap
andUpoptrap
to denote entering and exiting a try-with body.The pushtrap/poptrap approach was already used late in the compilation chain, the reason for trying to push it as early as Clambda is that the ongoing work on flambda has shown the necessity to break the trywith blocks to do some optimizations, and reconstructing trywith blocks at the end of the middle-end is not always possible.
It would have been feasible to retain both the trywith and traps based constructs, but we feel this would only lead to more maintaining load (as some paths in the backend are only taken when flambda is enabled or disabled), and translation of a trywith construct to catch and traps is fairly straightforward (with a special case for exit/staticfail from within a trywith block to outside it; the logic that was present in the
Iexit
case inlinearize.ml
has been moved earlier to compensate).To address the safety problems that such a change implies, three invariants checking modules have been added:
Cmm_invariants
checks a few continuations-related invariants (on both catch/exit continuations and trap continuations) on the cmm representation. This was already proposed in a separate pull request (Cmm invariants #1400), but this pull request extends the invariants with trap-related ones and makes the invariant checks mandatory.Trap_analysis
annotates mach instructions with trap stack information. Since no pushtrap/poptrap instructions will be added by the backend, it also checks consistency on trap stacks at exit points. This should address the problems noted in an earlier discussion of the feature about the exception edges of the control flow graph not being explicit enough.Linear_invariants
checks again consistency of trap depths on the linear representation, in particular that extra exit instructions added during the transformation from mach are consistent with the rest of the program.Changes to the compilation of catch constructs
When compiling a catch handler to linear instructions, the previous choice was to put the handler(s) first, then the actual body, with an extra jump statement to the body code added before the handlers.
This pull request inverts the order, so writes the body first, then the handlers, and of course adds an extra jump at the end of the body to the following code.
This was motivated by a suspicion that the use of
call
andreturn
x86 instructions to respectively branch into the body and go back to the handler interfered with the prediction of returns that was done in the processor, causing branch misses. With this patch, going to the body is a simple fallthrough, going to the handler in theraise
case is apop
followed by ajmp
, and additional work is done at the beginning of the body to compute and store the address of the handler on the stack.To give an idea of how much time this could save, we ran the following micro-benchmark:
through the following commands:
$ ./ocamlopt.opt -o microbench microbench.ml
$ perf stat ./microbench 1000000000
and we got the following results:
On trunk:
With this patch:
The huge gains in execution time can be linked to the big difference in branch-misses, with roughly one branch-miss for every raise in the program.
Tests done on more complex programs are of course much less impressive but still show improvement, and more serious benchmarking is being done using the flambda benchmarks infrastructure. Results will be linked to in this thread once they are available.
In addition to the benchmarking, testing of the patched emitters has been done on all of the supported architectures. More precisely, amd64, arm, arm64, and power were tested on real hardware (by running the whole testsuite), i386 was tested on amd64 by configuring the compiler to produce 32-bits code, and s390x was tested through qemu (some of the lib-threads tests did not finish, likely due to the qemu scheduler, but after disabling them other tests passed fine).
About separating the patches
While the two changes were described separately, they are submitted in a single pull request because they correspond to the exceptions-related changes in the flambda branch that are not dependent on the middle-end. We would definitely prefer to merge them together, rather than separately, and we have not tested the impact of one part without the other.