Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8322630: Remove ICStubs and related safepoints #17495

Closed
wants to merge 19 commits into from

Conversation

fisk
Copy link
Contributor

@fisk fisk commented Jan 19, 2024

ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs.

The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state.

With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler.

I have tested the changes from tier1-7, and run through full aurora performance tests.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8322630: Remove ICStubs and related safepoints (Enhancement - P4)

Reviewers

Contributors

  • Martin Doerr <mdoerr@openjdk.org>
  • Aleksey Shipilev <shade@openjdk.org>
  • Amit Kumar <amitkumar@openjdk.org>
  • Robbin Ehn <rehn@openjdk.org>
  • Aleksei Voitylov <avoitylov@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495
$ git checkout pull/17495

Update a local copy of the PR:
$ git checkout pull/17495
$ git pull https://git.openjdk.org/jdk.git pull/17495/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17495

View PR using the GUI difftool:
$ git pr show -t 17495

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17495.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 19, 2024

👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

/contributor add @TheRealMDoerr

@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

/contributor add @shipilev

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk
Contributor Martin Doerr <mdoerr@openjdk.org> successfully added.

@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

/contributor add @offamitkumar

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk
Contributor Aleksey Shipilev <shade@openjdk.org> successfully added.

@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

/contributor add @robehn

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk
Contributor Amit Kumar <amitkumar@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk
Contributor Robbin Ehn <rehn@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk The following labels will be automatically applied to this pull request:

  • graal
  • hotspot
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Jan 19, 2024
@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

/contributor add avoitylov

@openjdk
Copy link

openjdk bot commented Jan 19, 2024

@fisk
Contributor Aleksei Voitylov <avoitylov@openjdk.org> successfully added.

@fisk
Copy link
Contributor Author

fisk commented Jan 19, 2024

Thanks to the OpenJDK port maintainers for picking this up! All added to contributors, hopefully.

@@ -211,8 +211,10 @@
nonstatic_field(ArrayKlass, _dimension, int) \
volatile_nonstatic_field(ArrayKlass, _higher_dimension, ObjArrayKlass*) \
volatile_nonstatic_field(ArrayKlass, _lower_dimension, ArrayKlass*) \
nonstatic_field(CompiledICHolder, _holder_metadata, Metadata*) \
nonstatic_field(CompiledICHolder, _holder_klass, Klass*) \
volatile_nonstatic_field(CompiledICData, _speculated_method, Method*) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please duplicate these CompiledICData declarations in vmStructs_jvmci.cpp so that they can be used by Graal to ascertain whether ICStubs are in use (Graal is still supporting multi JDK versions).

@voitylov
Copy link

ARM32 passed relevant tests after the update. Thanks @fisk !

@fisk fisk marked this pull request as ready for review January 19, 2024 14:55
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 19, 2024
@mlbridge
Copy link

mlbridge bot commented Jan 19, 2024

@TheRealMDoerr
Copy link
Contributor

Test results are good on all SAP supported platforms (CPU: x86_64, aarch64, PPC64; OS: linux, Windows, AIX). Performance looks also good.

@robehn
Copy link
Contributor

robehn commented Jan 22, 2024

I believe this was the last major piece in task we started over 5y ago of removing runtime safepoint and latencies.
Or as some might say, we finally have a runtime good enough to run ZGC ;) (and Shenandoah).
Thank you @fisk for completing this milestone!

Risc-v passes my testing. (vf2 (t1) + qemu (t1-2), ran twice, once on v3 branch and once this pr)

@RealFYang can you please review RV changes?

@tschatzl
Copy link
Contributor

tschatzl commented Jan 23, 2024

Fwiw, the change makes class unloading regress significantly in a class unloading stress test (unloading 60k classes), seemingly tripling the time it takes for the "Purge Unlinked NMethods" phase (~20ms -> ~60ms).

This may not be a problem for the concurrent gcs, but can be for the STW ones.

(Overall max G1 remark pause times went from 160ms to 220ms, regular Remark pauses which do class unloading from 120ms to 160ms).

@fisk
Copy link
Contributor Author

fisk commented Jan 23, 2024

Fwiw, the change makes class unloading regress significantly in a class unloading stress test (unloading 60k classes), seemingly tripling the time it takes for the "Purge Unlinked NMethods" phase (~20ms -> ~60ms).

This may not be a problem for the concurrent gcs, but can be for the STW ones.

(Overall max G1 remark pause times went from 160ms to 220ms, regular Remark pauses which do class unloading from 120ms to 160ms).

If the level of ~1 ms per 1000 unloaded classes worth of latency issue is crucial for G1, then maybe it's time for G1 to at least run purging concurrently?

@TheRealMDoerr
Copy link
Contributor

Do we know what makes that slower? Is it the glibc overhead of delete ic->data();?

@fisk
Copy link
Contributor Author

fisk commented Jan 24, 2024

@tschatzl what program did you run, so I can reproduce?

@tschatzl
Copy link
Contributor

I think the ccstress application attached to https://bugs.openjdk.org/browse/JDK-8315503 should show the issue (not tested - today I've been busy completing a JDK 22 bugfix).

The actual application I'm using is specjbb2015 that basically executes that ccstress application as a java agent in parallel to jbb2015 just for extra load. I can certainly give you that.

Tomorrow I will also spend some time investigating this change a bit more (also on aarch64 - I think other factors may easily outweigh this difference), but I think that @TheRealMDoerr is correct about the C heap allocator just being very slow.
Glibc being slow is a "known" issue, other delete calls in class unloading already take a significant chunk of that phase, so there is already some interest (from me) to do something about them.

There are ideas how to handle this, one of them being moving this (and other) delete calls into some concurrent phase somehow (in addition to making metaspace purging concurrent). That would obviously only help G1 though, so maybe there is some better option.

fisk and others added 2 commits February 9, 2024 10:02
Co-authored-by: Thomas Schatzl <59967451+tschatzl@users.noreply.github.com>
@fisk
Copy link
Contributor Author

fisk commented Feb 9, 2024

Thanks for the reviews @tschatzl and @dean-long. I have pushed changes reflecting your feedback.

@@ -1571,6 +1551,8 @@ nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm,
__ clinit_barrier(rscratch2, rscratch1, &L_skip_barrier);
__ far_jump(RuntimeAddress(SharedRuntime::get_handle_wrong_method_stub()));

// Verified entry point must be aligned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should come around line 1539.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will fix it.

__ align(CodeEntryAlignment);

// Verified entry point
__ bind(verified);
int vep_offset = __ pc() - start;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add back
// Verified entry point
above this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@dean-long
Copy link
Member

If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up.

Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code.

Also, why don't we have to check for method->is_old() anymore?

}
void CompiledDirectCall::set(const methodHandle& callee_method) {
CompiledMethod* code = callee_method->code();
CompiledMethod* caller = CodeCache::find_compiled(instruction_address());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing the slow find_compiled unconditionally, I think it would be better to check is_interpreted_call first.

// Verify that inline caches pointing to bad nmethods are clean
if (!nm->is_in_use() || (nm->method()->code() != nm)) {
if (!nm->is_in_use() || nm->is_unloading()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain this change. It's not obvious.

@@ -101,7 +101,7 @@ class XCompiledICProtectionBehaviour : public CompiledICProtectionBehaviour {
}

virtual bool is_safe(CompiledMethod* method) {
if (SafepointSynchronize::is_at_safepoint()) {
if (SafepointSynchronize::is_at_safepoint() || method->is_unloading()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why is_unloading() is needed now.

@@ -2079,65 +1771,41 @@ JRT_LEAF(void, SharedRuntime::fixup_callers_callsite(Method* method, address cal
MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, JavaThread::current()));

CodeBlob* cb = CodeCache::find_blob(caller_pc);
if (cb == nullptr || !cb->is_compiled() || callee->is_unloading()) {
if (cb == nullptr || !cb->is_compiled() || !callee->is_in_use() || callee->is_unloading()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for is_in_use() is just an optimization, right?

}

CompiledDirectCall* callsite = CompiledDirectCall::before(return_pc);
callsite->set_to_clean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever way to get rid of should_fixup_call_destination(), something I've wanted to do!

@fisk
Copy link
Contributor Author

fisk commented Feb 12, 2024

If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up.

:)

Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code.

Sure! So I think at some point we had static calls and virtual calls, with virtual calls using CompiledIC and static calls using CompiledStaticCall. I think there must have been some confusion at some point when introducing opt_virtual. They are virtual calls, so it might have seem "natural" to jam them into CompiledIC, and have CompiledIC deal with them. But since they in their code shapes are a lot more similar to static calls, with a direct call and a stub for interpreter entry, it never was a great fit. To deal with it we had to 1) ensure the data is always null as there is no data, because it isn't really an inline cache, 2) check for it everywhere to not spin up ICStubs when transitioning, because it isn't really an inline cache, and then when manipulating the callsite, have some some native call wrapper abstraction with virtual calls that would convert requests to update the inline cache to update the corresponding direct call and stub instead for optimized virtual calls, and get out of the inline cache world.

To me this always seemed a bit backwards. So in my new implementation I instead accept that opt_virtual calls really are more like the static calls (generated code is identical), and have pretty much nothing to do with inline caches, despite being virtual calls, and made them both use a common DirectCall abstraction instead, that fit both of them, as they are both direct calls.

Could we have cleaned that up earlier? Yes, probably. But it was pretty ingrained and "interesting" to change with incremental changes. I figured this was my chance to do this right since I'm rewriting the CompiledIC file pretty much from scratch as you noticed. I hope you agree with this decision.

Also, why don't we have to check for method->is_old() anymore?

The checks for is_old were there because when performing an inline cache transition, we had situations where we would need an ICStub, and after running out of ICStubs we would have to request a safepoint to refill ICStubs. That safepoint could sneak in a class redefinition operation, rendering the methods invalid, or at least is_old(). Since I removed ICStubs and don't need any ICStubs to transition inline caches, we also get any safepoints in this code. And therefore we can't get any class redefinition, and hence can remove all the is_old() checks, which are now effectively dead code.

@fisk
Copy link
Contributor Author

fisk commented Feb 12, 2024

Thanks for the review @dean-long! I updated the comments as requested.

@openjdk
Copy link

openjdk bot commented Feb 13, 2024

@fisk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322630: Remove ICStubs and related safepoints

Co-authored-by: Martin Doerr <mdoerr@openjdk.org>
Co-authored-by: Aleksey Shipilev <shade@openjdk.org>
Co-authored-by: Amit Kumar <amitkumar@openjdk.org>
Co-authored-by: Robbin Ehn <rehn@openjdk.org>
Co-authored-by: Aleksei Voitylov <avoitylov@openjdk.org>
Reviewed-by: tschatzl, aboldtch, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 25 new commits pushed to the master branch:

  • 0c2def0: 8325653: Erroneous exhaustivity analysis for primitive patterns
  • d003996: 8325743: test/jdk/java/nio/channels/unixdomain/SocketOptions.java enhance user name output in error case
  • ea98de6: 8325449: [BACKOUT] use "dmb.ishst+dmb.ishld" for release barrier
  • 7f6bb71: 8319799: Recursive lightweight locking: x86 implementation
  • ea41932: 8325395: Missing copyright header in StackFilter.java
  • 8765b17: 8325800: Drop unused cups declaration from Oracle build configuration
  • 628cd8a: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files
  • 842b895: 8303891: Speed up Zip64SizeTest using a small ZIP64 file
  • 243fb46: 8325750: Fix spelling of ForceTranslateFailure help message
  • 74b90aa: 8325672: C2: allocate PhaseIdealLoop::_loop_or_ctrl from C->comp_arena()
  • ... and 15 more: https://git.openjdk.org/jdk/compare/62a4be03cfcb5dcae77358ff25fdc9e2e9660575...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Feb 13, 2024
@fisk
Copy link
Contributor Author

fisk commented Feb 13, 2024

Thanks for the review, @dean-long! I pushed a trivial merge conflict with the next round of NULL -> nullptr changes. Running final round of tests now

@fisk
Copy link
Contributor Author

fisk commented Feb 14, 2024

Tier1-7 test results look good.

@fisk
Copy link
Contributor Author

fisk commented Feb 14, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Feb 14, 2024

Going to push as commit 84965ea.
Since your change was applied there have been 25 commits pushed to the master branch:

  • 0c2def0: 8325653: Erroneous exhaustivity analysis for primitive patterns
  • d003996: 8325743: test/jdk/java/nio/channels/unixdomain/SocketOptions.java enhance user name output in error case
  • ea98de6: 8325449: [BACKOUT] use "dmb.ishst+dmb.ishld" for release barrier
  • 7f6bb71: 8319799: Recursive lightweight locking: x86 implementation
  • ea41932: 8325395: Missing copyright header in StackFilter.java
  • 8765b17: 8325800: Drop unused cups declaration from Oracle build configuration
  • 628cd8a: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files
  • 842b895: 8303891: Speed up Zip64SizeTest using a small ZIP64 file
  • 243fb46: 8325750: Fix spelling of ForceTranslateFailure help message
  • 74b90aa: 8325672: C2: allocate PhaseIdealLoop::_loop_or_ctrl from C->comp_arena()
  • ... and 15 more: https://git.openjdk.org/jdk/compare/62a4be03cfcb5dcae77358ff25fdc9e2e9660575...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 14, 2024
@openjdk openjdk bot closed this Feb 14, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 14, 2024
@openjdk
Copy link

openjdk bot commented Feb 14, 2024

@fisk Pushed as commit 84965ea.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@fisk
Copy link
Contributor Author

fisk commented Feb 14, 2024

Thanks for the review @tschatzl!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org
9 participants