Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of Scheduler#disposeGracefully threads #3259

Merged
merged 10 commits into from Nov 4, 2022

Conversation

simonbasle
Copy link
Member

@simonbasle simonbasle commented Oct 28, 2022

This change introduces a DisposeAwaiterRunnable with a small pool of
threads dedicated to polling the termination status after a graceful
Scheduler shutdown.

Previously, one Thread would be created for each Scheduler that is
disposed gracefully. While we don't expect this to be an issue in most
production applications, this can lead to hitting native thread limits
faster. Notably, stress tests around graceful disposal create a lot of
schedulers for that purpose.

This change also ensures that the evictor executorServices of both the
BoundedElasticScheduler and ElasticScheduler are limited to at most 1
thread.

Finally, it attempts to improve the SchedulersStressTest to avoid the
OOMs as much as possible: block on disposeGracefully() calls, increase
the heap of forked JVMs for jcstress, and ultimately stop covering the
BoundedElasticScheduler in the stress test.

Fixes #3258.

This change modifies the `DisposeAwaiter` interface in order to be able
to poll for graceful shutdown termination status rather than await/block
for it.

It introduces a DisposeAwaiterRunnable with a small pool of threads
dedicated to polling the termination status after a graceful Scheduler
shutdown.

Previously, one Thread would be created for each Scheduler that is
disposed gracefully. While we don't expect this to be an issue in most
production applications, this can lead to hitting native thread limits
faster. Notably, stress tests around graceful disposal create a lot of
schedulers for that purpose.
@simonbasle
Copy link
Member Author

currently seeing another type of OOM issue:

  Messages:
    java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:463)
        at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1237)
        at java.util.concurrent.ScheduledThreadPoolExecutor.<init>(ScheduledThreadPoolExecutor.java:447)
        at reactor.core.scheduler.ParallelScheduler.get(ParallelScheduler.java:80)
        at reactor.core.scheduler.ParallelScheduler.init(ParallelScheduler.java:109)
        at reactor.core.scheduler.SchedulersStressTest$ParallelSchedulerDisposeGracefullyAndDisposeStressTest.<init>(SchedulersStressTest.java:318)
        at reactor.core.scheduler.SchedulersStressTest_ParallelSchedulerDisposeGracefullyAndDisposeStressTest_jcstress.internalRun(SchedulersStressTest_ParallelSchedulerDisposeGracefullyAndDisposeStressTest_jcstress.java:118)
        at org.openjdk.jcstress.infra.runners.Runner.run(Runner.java:72)

@simonbasle
Copy link
Member Author

we couldn't get to the bottom of the OOMs on M1 machines, but it seems to still significantly improve the situation, and most notably on CI.

the last commit removes stress tests for BoundedElasticScheduler, which by nature puts more pressure in terms of threads. these can be reintroduced in a branch to experiment with improving stress tests on a Mac OSX M1 machine.

@simonbasle simonbasle added this to the 3.4.25 milestone Nov 4, 2022
@simonbasle simonbasle added the type/enhancement A general enhancement label Nov 4, 2022
@simonbasle simonbasle self-assigned this Nov 4, 2022
@simonbasle simonbasle marked this pull request as ready for review November 4, 2022 14:09
@simonbasle simonbasle requested a review from a team as a code owner November 4, 2022 14:09
@@ -174,6 +144,11 @@ public void arbiter(IIZ_Result r) {
// by r.r1 and r.r2, which should be equal.
boolean consistentState = r.r1 == r.r2;
r.r3 = consistentState && scheduler.isDisposed();
if (consistentState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@simonbasle simonbasle merged commit 9fe3241 into 3.4.x Nov 4, 2022
@simonbasle simonbasle deleted the 3258-DisposeAwaiterLimitedThreads branch November 4, 2022 16:50
@reactorbot
Copy link

@simonbasle this PR seems to have been merged on a maintenance branch, please ensure the change is merge-forwarded to intermediate maintenance branches and up to main 🙇

@simonbasle
Copy link
Member Author

merged into 3.5.0 with wrong commit message by commit 2a83001

chemicL pushed a commit that referenced this pull request Mar 7, 2023
This change introduces a `DisposeAwaiterRunnable` with a small pool of
threads dedicated to polling the termination status after a graceful
Scheduler shutdown.

Previously, one Thread would be created for each Scheduler that is
disposed gracefully. While we don't expect this to be an issue in most
production applications, this can lead to hitting native thread limits
faster. Notably, stress tests around graceful disposal create a lot of
schedulers for that purpose.

This change also ensures that the evictor executorServices of both the
BoundedElasticScheduler and ElasticScheduler are limited to at most 1
thread.

Finally, it attempts to improve the SchedulersStressTest to avoid the
OOMs as much as possible: block on disposeGracefully() calls, increase
the heap of forked JVMs for jcstress, and ultimately stop covering the
BoundedElasticScheduler in the stress test.

Fixes #3258.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler graceful dispose should share awaitTermination monitoring threads
3 participants