-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check for minor GC on all loop iterations #11572
Comments
These checks are necessary for proper signal handling (in OCaml 4) and proper operation of the multicore GC (in OCaml 5). You'll be glad(?) to learn that OCaml 5 adds more of these checks on selected tail calls so that no code can loop forever without checking for GC and signal requests. The cost is relatively low (a compare and a correctly-predicted conditional branch), so we're comfortable having one check per loop iteration. The only case where the checks could be omitted safely is a counter "for" loop when the compiler knows that the number of iterations is small. I don't think this is worth pursuing. |
Thanks for the explanation. That makes sense. Wow; every loop iteration now has overhead of a memory read (presumably cached), comparison, and conditional jump. I think an annotation should be added for marking locations where these tests are not wanted. |
Actually this "poll" operation is only inserted if there isn't already a poll or an allocation in the loop body. The performance degradation would only affect very tight loops. Do you have some actual code that is strongly affected by the change? |
Well-predicted branches are extremely cheap in modern hardware. (In your example I second the request for actual affected code. (An optimizing compiler could consider loop unrolling, etc.) |
However, the generated code for the loop seems suboptimal, indeed this changes the branch prediction in the absence of predictor history: the "back-edge" is predicted as not-taken rather than taken. One way to avoid this would be to move the polling code from the bottom to the top of the loop (and maybe add a one-use jump to skip polling in the first iteration). The poll-at-top vs. poll-at-bottom strategies were thoroughly discussed at #10039 (comment) and results of benchmarks were used to choose poll-at-bottom (#10039 (comment)). However this was decided using instruction count, which does not account for branch prediction, and this did not test a version with an additional (well-predicted) jmp to skip polling before the first iteration. |
Really? In the example, the "back edge" is |
What counts is that |
Sorry, I confused myself: the forward jump |
Good points. It is hard to detect much difference with the added polling instructions, at least on AMD64. |
The optimizing compiler seems to generate a check for minor GC on every loop iteration. This happens for while, for, let rec. No matter how simple the loop. See examples below. This was tested on Ocaml 4.14.0 on MacOS. These seem to be unnecessary... maybe I'm missing something?
`
let rec_loop ofs =
let rec loop ofs =
loop ofs
in
loop ofs
;;
let while_loop ofs =
while true do ()
done
;;
let for_loop len =
for ofs = 0 to len do ()
done
;;
`
Resulting assembly looks like the following (below is for the for_loop function).
loop.zip
The text was updated successfully, but these errors were encountered: