Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmauth per-user metrics can cause high memory usage in the long term #6247

Open
ebensom opened this issue May 9, 2024 · 2 comments
Open
Assignees
Labels
bug Something isn't working vmauth

Comments

@ebensom
Copy link

ebensom commented May 9, 2024

Describe the bug

The vmauth support dynamic config reloading. Over a long time period, many users can be created and deleted dynamically from the config, followed by reload. This is the scenario for us, where we use vmauth together with vm-operator, and we dynamically manage clients of the VictoriaMetrics cluster for each connected system using VMUser CRs, and there is a higher rate of change of VMUsers (30-100 creation or deletion per day).

vmauth exposes a bunch of very useful per-user metrics (the highest cardinality being of course the ones with type histogram/summary), however when a user gets removed from a config, the metrics corresponing to the user keep being exposed until restart. In the long term, this causes vmauth to consume more and more memory for metrics in the long term, potentially leading to OOM situations. Attached pprof and web graph for reference to troubleshoot the issue (see in Screenshots field)

Is there a way to change the authconfig reloading behavior (in a backards compatible, i.e. configurable way with a compatibility flag) to be able to detect config differences between the current loaded and the new config, and prune metrics for users not present in the new config using either DeleteLabelValues or DeletePartialMatch method the corresponding metrics?

To Reproduce

  • start vmauth
  • in an infinite loop:
    • add and remove several users
    • trigger config reload

Version

1.100.1

Logs

No response

Screenshots

vmauth_mem.pprof.tar.gz
vmauth_mem_inuse_web.pdf

Used command-line flags

No response

Additional information

No response

@ebensom ebensom added the bug Something isn't working label May 9, 2024
@f41gh7 f41gh7 added the vmauth label May 9, 2024
@f41gh7
Copy link
Contributor

f41gh7 commented May 9, 2024

Thanks for reporting, It's a bug with metrics package usage. It doesn't properly unregistered summary metrics.

f41gh7 added a commit that referenced this issue May 10, 2024
it's needed to remove Summary metric type from the global state of metrics package.
metrics package tracks each bucket of summary and periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

#6247
@f41gh7 f41gh7 self-assigned this May 10, 2024
f41gh7 added a commit that referenced this issue May 13, 2024
it's needed to remove Summary metric type from the global state of metrics package.
metrics package tracks each bucket of summary and periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

#6247
hagen1778 pushed a commit that referenced this issue May 14, 2024
it's needed to remove Summary metric type from the global state of metrics package.
metrics package tracks each bucket of summary and periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

#6247
hagen1778 pushed a commit that referenced this issue May 14, 2024
it's needed to remove Summary metric type from the global state of
metrics package. metrics package tracks each bucket of summary and
periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

#6247
@hagen1778
Copy link
Collaborator

#6252 has been merged and will be included into the next release.

hagen1778 pushed a commit that referenced this issue May 14, 2024
it's needed to remove Summary metric type from the global state of
metrics package. metrics package tracks each bucket of summary and
periodically swaps old buckets with new.

Simple set unregister is not enough to release memory used by Set

#6247
(cherry picked from commit 6a6e34a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vmauth
Projects
None yet
Development

No branches or pull requests

3 participants