Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique name tag per executor causes unbounded cardinality metrics #2182

Closed
chringwer opened this issue Jun 8, 2020 · 4 comments
Closed

Unique name tag per executor causes unbounded cardinality metrics #2182

chringwer opened this issue Jun 8, 2020 · 4 comments
Labels
status/declined We feel we shouldn't currently apply this change/suggestion type/enhancement A general enhancement warn/api-change Breaking change with compilation errors

Comments

@chringwer
Copy link

The way executor metrics are tagged using an ever increasing integer per executor seems to cause problems with elastic schedulers. At least when using prometheus as a monitoring backend we were facing OOMs due to the high cardinality of time series produced by assigning unique names to each new executor which is spawned when the "elastic" scheduler scales out.

//we now want an executorId unique to a given scheduler
String executorId = schedulerId + "-" +
executorDifferentiator.computeIfAbsent(scheduler, key -> new AtomicInteger(0))
.getAndIncrement();

Maybe this could be solved by assigning the executorId based on its "slot" in the scheduler so that the cardinality is bound by the scheduler's maxSize value?

@reactorbot reactorbot added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Jun 8, 2020
@simonbasle simonbasle added status/need-decision This needs a decision from the team type/bug A general bug type/enhancement A general enhancement and removed ❓need-triage This issue needs triage, hasn't been looked at by a team member yet labels Nov 2, 2020
@simonbasle
Copy link
Member

hey @chringwer I'm sorry this fell through the cracks.

one big issue with your idea is that it is not that simple, because the metrics decorator is installed through a public API based on BiFunction<Scheduler, ScheduledExecutorService, ScheduledExecutorService>, so it has no way of knowing if the executor has a slot or can "reuse" a name.

@simonbasle simonbasle added status/declined We feel we shouldn't currently apply this change/suggestion status/need-design This needs more in depth design work warn/api-change Breaking change with compilation errors and removed status/need-decision This needs a decision from the team type/bug A general bug status/need-design This needs more in depth design work labels Nov 2, 2020
@simonbasle
Copy link
Member

there are other caveats that are potential blockers:

  1. the elastic scheduler specifically DOESN'T have an upper bound to the number of live executors it can spawn, so this idea of slot wouldn't help here either
  2. once the ScheduledExecutorService is shut down, the meters are removed using the id, which is problematic if multiple live executors are associated to the same name
  3. executor decoration delegates to Micrometer, which decorates executors with a gauge. reusing the same name for several executors would result in metrics being produced only by the last decorated executor...

@lotusnow
Copy link

lotusnow commented Aug 8, 2022

Is there any solution or workaround for this?

@simonbasle
Copy link
Member

@lotusnow Schedulers.enableMetrics() is too global and cannot really be fixed to accommodate scheduler variants that create a lot of ExecutorServices (which is the root cause of this issue).

In upcoming 3.5.0 (the next major version scheduled around November this year), we're introducing a dedicated Scheduler wrapper that users will need to explicitly wrap around the Scheduler instances they want to instrument.
This is part of a new module that explicitly depends on Micrometer: reactor-core-micrometer v1.0.0.

See #3109, which will be made available in the soon-to-be-published 3.5.0-M5 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/declined We feel we shouldn't currently apply this change/suggestion type/enhancement A general enhancement warn/api-change Breaking change with compilation errors
Projects
None yet
Development

No branches or pull requests

4 participants