-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8322630: Remove ICStubs and related safepoints #17495
Conversation
👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into |
/contributor add @TheRealMDoerr |
/contributor add @shipilev |
@fisk |
/contributor add @offamitkumar |
@fisk |
/contributor add @robehn |
@fisk |
@fisk |
/contributor add avoitylov |
@fisk |
Thanks to the OpenJDK port maintainers for picking this up! All added to contributors, hopefully. |
@@ -211,8 +211,10 @@ | |||
nonstatic_field(ArrayKlass, _dimension, int) \ | |||
volatile_nonstatic_field(ArrayKlass, _higher_dimension, ObjArrayKlass*) \ | |||
volatile_nonstatic_field(ArrayKlass, _lower_dimension, ArrayKlass*) \ | |||
nonstatic_field(CompiledICHolder, _holder_metadata, Metadata*) \ | |||
nonstatic_field(CompiledICHolder, _holder_klass, Klass*) \ | |||
volatile_nonstatic_field(CompiledICData, _speculated_method, Method*) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please duplicate these CompiledICData
declarations in vmStructs_jvmci.cpp
so that they can be used by Graal to ascertain whether ICStubs are in use (Graal is still supporting multi JDK versions).
ARM32 passed relevant tests after the update. Thanks @fisk ! |
Webrevs
|
Test results are good on all SAP supported platforms (CPU: x86_64, aarch64, PPC64; OS: linux, Windows, AIX). Performance looks also good. |
I believe this was the last major piece in task we started over 5y ago of removing runtime safepoint and latencies. Risc-v passes my testing. (vf2 (t1) + qemu (t1-2), ran twice, once on v3 branch and once this pr) @RealFYang can you please review RV changes? |
Fwiw, the change makes class unloading regress significantly in a class unloading stress test (unloading 60k classes), seemingly tripling the time it takes for the "Purge Unlinked NMethods" phase (~20ms -> ~60ms). This may not be a problem for the concurrent gcs, but can be for the STW ones. (Overall max G1 remark pause times went from 160ms to 220ms, regular Remark pauses which do class unloading from 120ms to 160ms). |
If the level of ~1 ms per 1000 unloaded classes worth of latency issue is crucial for G1, then maybe it's time for G1 to at least run purging concurrently? |
Do we know what makes that slower? Is it the glibc overhead of |
@tschatzl what program did you run, so I can reproduce? |
I think the The actual application I'm using is specjbb2015 that basically executes that ccstress application as a java agent in parallel to jbb2015 just for extra load. I can certainly give you that. Tomorrow I will also spend some time investigating this change a bit more (also on aarch64 - I think other factors may easily outweigh this difference), but I think that @TheRealMDoerr is correct about the C heap allocator just being very slow. There are ideas how to handle this, one of them being moving this (and other) |
Co-authored-by: Thomas Schatzl <59967451+tschatzl@users.noreply.github.com>
Thanks for the reviews @tschatzl and @dean-long. I have pushed changes reflecting your feedback. |
@@ -1571,6 +1551,8 @@ nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm, | |||
__ clinit_barrier(rscratch2, rscratch1, &L_skip_barrier); | |||
__ far_jump(RuntimeAddress(SharedRuntime::get_handle_wrong_method_stub())); | |||
|
|||
// Verified entry point must be aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment should come around line 1539.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will fix it.
__ align(CodeEntryAlignment); | ||
|
||
// Verified entry point | ||
__ bind(verified); | ||
int vep_offset = __ pc() - start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add back
// Verified entry point
above this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up. Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code. Also, why don't we have to check for method->is_old() anymore? |
} | ||
void CompiledDirectCall::set(const methodHandle& callee_method) { | ||
CompiledMethod* code = callee_method->code(); | ||
CompiledMethod* caller = CodeCache::find_compiled(instruction_address()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing the slow find_compiled
unconditionally, I think it would be better to check is_interpreted_call
first.
// Verify that inline caches pointing to bad nmethods are clean | ||
if (!nm->is_in_use() || (nm->method()->code() != nm)) { | ||
if (!nm->is_in_use() || nm->is_unloading()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain this change. It's not obvious.
@@ -101,7 +101,7 @@ class XCompiledICProtectionBehaviour : public CompiledICProtectionBehaviour { | |||
} | |||
|
|||
virtual bool is_safe(CompiledMethod* method) { | |||
if (SafepointSynchronize::is_at_safepoint()) { | |||
if (SafepointSynchronize::is_at_safepoint() || method->is_unloading()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why is_unloading() is needed now.
@@ -2079,65 +1771,41 @@ JRT_LEAF(void, SharedRuntime::fixup_callers_callsite(Method* method, address cal | |||
MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, JavaThread::current())); | |||
|
|||
CodeBlob* cb = CodeCache::find_blob(caller_pc); | |||
if (cb == nullptr || !cb->is_compiled() || callee->is_unloading()) { | |||
if (cb == nullptr || !cb->is_compiled() || !callee->is_in_use() || callee->is_unloading()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for is_in_use() is just an optimization, right?
} | ||
|
||
CompiledDirectCall* callsite = CompiledDirectCall::before(return_pc); | ||
callsite->set_to_clean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever way to get rid of should_fixup_call_destination(), something I've wanted to do!
:)
Sure! So I think at some point we had static calls and virtual calls, with virtual calls using CompiledIC and static calls using CompiledStaticCall. I think there must have been some confusion at some point when introducing opt_virtual. They are virtual calls, so it might have seem "natural" to jam them into CompiledIC, and have CompiledIC deal with them. But since they in their code shapes are a lot more similar to static calls, with a direct call and a stub for interpreter entry, it never was a great fit. To deal with it we had to 1) ensure the data is always null as there is no data, because it isn't really an inline cache, 2) check for it everywhere to not spin up ICStubs when transitioning, because it isn't really an inline cache, and then when manipulating the callsite, have some some native call wrapper abstraction with virtual calls that would convert requests to update the inline cache to update the corresponding direct call and stub instead for optimized virtual calls, and get out of the inline cache world. To me this always seemed a bit backwards. So in my new implementation I instead accept that opt_virtual calls really are more like the static calls (generated code is identical), and have pretty much nothing to do with inline caches, despite being virtual calls, and made them both use a common DirectCall abstraction instead, that fit both of them, as they are both direct calls. Could we have cleaned that up earlier? Yes, probably. But it was pretty ingrained and "interesting" to change with incremental changes. I figured this was my chance to do this right since I'm rewriting the CompiledIC file pretty much from scratch as you noticed. I hope you agree with this decision.
The checks for is_old were there because when performing an inline cache transition, we had situations where we would need an ICStub, and after running out of ICStubs we would have to request a safepoint to refill ICStubs. That safepoint could sneak in a class redefinition operation, rendering the methods invalid, or at least is_old(). Since I removed ICStubs and don't need any ICStubs to transition inline caches, we also get any safepoints in this code. And therefore we can't get any class redefinition, and hence can remove all the is_old() checks, which are now effectively dead code. |
Thanks for the review @dean-long! I updated the comments as requested. |
@fisk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 25 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for the review, @dean-long! I pushed a trivial merge conflict with the next round of NULL -> nullptr changes. Running final round of tests now |
Tier1-7 test results look good. |
/integrate |
Going to push as commit 84965ea.
Your commit was automatically rebased without conflicts. |
Thanks for the review @tschatzl! |
ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs.
The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state.
With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler.
I have tested the changes from tier1-7, and run through full aurora performance tests.
Progress
Issue
Reviewers
Contributors
<mdoerr@openjdk.org>
<shade@openjdk.org>
<amitkumar@openjdk.org>
<rehn@openjdk.org>
<avoitylov@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495
$ git checkout pull/17495
Update a local copy of the PR:
$ git checkout pull/17495
$ git pull https://git.openjdk.org/jdk.git pull/17495/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17495
View PR using the GUI difftool:
$ git pr show -t 17495
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17495.diff
Webrev
Link to Webrev Comment