Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory after using Schedulers.enableMetrics() #2899

Closed
Novery opened this issue Jan 24, 2022 · 7 comments
Closed

Out of memory after using Schedulers.enableMetrics() #2899

Novery opened this issue Jan 24, 2022 · 7 comments
Labels
for/team-attention This issue needs team attention or action for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation
Milestone

Comments

@Novery
Copy link

Novery commented Jan 24, 2022

Out of memory after using Schedulers.enableMetrics()

Actual Behavior

When metrics are enabled, the metrics do not converge.

The data for accessing /actuator/prometheus is as follows
name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-{$number} the number do not converge

executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-5939",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-4381",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7102",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} 0.0
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-741",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7235",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} 0.0
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7368",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-1759",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-874",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-2823",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-2956",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-3051",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7106",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} 0.0
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-3184",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7239",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} 0.0
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-1625",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-5809",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-4382",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-5580",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-740",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-873",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-1758",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-2955",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-3185",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7238",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-3052",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-7105",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-2822",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-1624",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-5808",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-4383",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN
executor_active_threads{application="web-service",name="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)-4250",reactor_scheduler_id="boundedElastic(\"boundedElastic\",maxThreads=40,maxTaskQueuedPerThread=100000,ttl=60s)",} NaN

Steps to Reproduce

reactor.core.scheduler.SchedulerMetricDecorator will generate an executorId incrementally for each execution for the same Scheduler

//we now want an executorId unique to a given scheduler
		String executorId = schedulerId + "-" +
				executorDifferentiator.computeIfAbsent(scheduler, key -> new AtomicInteger(0))
				                      .getAndIncrement();

At the same time executorId as metrics name

MetricsRemovingScheduledExecutorService() {
				super(ExecutorServiceMetrics.monitor(globalRegistry, service, executorId, tags));
			}

BoundedElasticScheduler will periodically recycle idle resources and recreate resources when needed later, and the executor Id is incremented when new resources are created

This leads to an uncontrollable number of Metric and eventually leads to memory overflow

Your Environment

  • Reactor version(s) used: 3.3.11.RELEASE
@simonbasle simonbasle added for/team-attention This issue needs team attention or action for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation labels Jan 24, 2022
@simonbasle simonbasle added this to the 3.4.x Backlog milestone Jan 24, 2022
@simonbasle
Copy link
Member

we'd need to somehow detect schedulers with potential for high/unbounded cardinality and figure out how to cater to that use case... but can we really figure out a generic satisfying solution?

the only thing that gets instrumented in Schedulers is their backing ScheduledExecutorService(s), if any. If we didn't differentiate between these, we'd potentially get inconsistent data when a single Scheduler is backed by multiple ScheduledExecutorServices with each their own queue of tasks 🤦

on the other hand we get Schedulers.elastic() and Schedulers.boundedElastic() which create a lot of such services. Where to put the needle here? 🤔

@simonbasle
Copy link
Member

@Novery what kind of metrics are you interested in, in Schedulers? (as a minimum)
it looks like decorating our plumbing (the ScheduledExecutorServices behind most Scheduler workers) was not a great choice in the first place. With the upcoming 3.5.0, maybe there's an opportunity to fix that and focus on data easily accessible for all Schedulers (not only the ones that can have a pool of workers).

@Novery
Copy link
Author

Novery commented Feb 1, 2022

@Novery what kind of metrics are you interested in, in Schedulers? (as a minimum) it looks like decorating our plumbing (the ScheduledExecutorServices behind most Scheduler workers) was not a great choice in the first place. With the upcoming 3.5.0, maybe there's an opportunity to fix that and focus on data easily accessible for all Schedulers (not only the ones that can have a pool of workers).

I mainly focus on executor_queued_tasks, executor_seconds_max, executor_queued_tasks, executor_completed_tasks_total to understand the load of the parallel scheduler

@simonbasle
Copy link
Member

there's an ongoing work to add TimedScheduler wrapper in a new reactor-core-micrometer module in 3.5.0, see #3109 which could provide a viable alternative

@Novery
Copy link
Author

Novery commented Jul 22, 2022

ok,thank you. I will keep watching

@Novery Novery closed this as completed Jul 22, 2022
@simonbasle
Copy link
Member

simonbasle commented Jul 26, 2022

@Novery any opinion on the direction of the aforementioned PR ?
it's bound to change a bit still, especially since there's a failing test, but I do need feedback

see #3109

@Novery
Copy link
Author

Novery commented Jul 27, 2022

@Novery any opinion on the direction of the aforementioned PR ? it's bound to change a bit still, especially since there's a failing test, but I do need feedback

see #3109

The tags now seem to be clean and controllable. The "executorId" is no longer works as the "executorServiceName". I think it can fix the current problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/team-attention This issue needs team attention or action for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation
Projects
None yet
Development

No branches or pull requests

2 participants