-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption #2725
base: develop
Are you sure you want to change the base?
Conversation
this create a copy of every kubernetes object for every /metrics call Signed-off-by: r2k1 <yokree@gmail.com>
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Quality Gate passedIssues Measures |
Additional Notes: kube-state-metrics employs certain strategies to optimize memory usage:
|
Thanks @r2k1 . We'll get this reviewed. |
} | ||
|
||
return cloneList | ||
return c.indexer.List() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// since the indexer returns the as-is pointer to the resource,
// we deep copy the resources such that callers don't corrupt the
// index
The comment explained why we don't do this. I know we've had problems with this in the past. The indexer.List() will return a slice containing pointers to the resources directly in the index. The reason we deep copy is to avoid corrupting the index or running into concurrent access problems (This is specifically problematic when you have the index attempting to update an existing resource's Labels while someone is reading the Labels). There are lots of cases where we are inspecting these objects live.
This can run fine in a cluster without ever hitting an issue, but there are edge cases which can create really hard-to-debug scenarios.
First of all, I will be the first to admit the following seems very reassuring:
I couldn't remember all the details, so I dug into our slack concerning the issues we were having at the time. It's totally possible that these issues are actually no longer a problem (this is from 4 years ago). We were getting this panic:
The function being called at the time was:
[6:56 PM]
[6:58 PM] so, if you look at what we're reading, it's a selector.Match (edited)
[7:02 PM] // The guarantees of thread safety provided by List/Get are only valid if the caller ^ I clearly wasn't happy about it 😆 So, in short, it does seem like there are locks that will handle specific field modifications (or either completely replace the entire object) within the index, BUT at the time, the If they fixed this issue, I will be totally happy to sign off on this change :) |
@r2k1 I've been very frustrated with the lack of flexibility in this API for a long time, and I totally support any and all motivation to improve (or completely re-work it). Hopefully this bug is long gone in the indexer storage, and we can revert back to using I just felt like it was important to let you know that I am absolutely interested in a solution here, and I hope I'm not coming across as being gate-keepy. I really appreciate your input and help here :) |
It could be also caused by the fact currently opencost reads data from indexers directly rather than informers (not sure if it's been designed to use in such way). https://github.com/opencost/opencost/pull/2736/files But I think this line:
And a few Maybe a golang linter can help to avoid accidental mutability instead. |
Opencost stores Kubernetes objects (Pod, ReplicaSet, Deployment, DaemonSet, etc.) with all their fields in memory.
For every request to the /metrics endpoint, a copy of these objects is created, which can lead to significant memory usage, particularly for large clusters.
I suspect this is to prevent accidental modifications, but how likely is that? I think there should be no reason to modify the k8s objects.
I ran some tests on a single-node cluster, creating 300 deployments. Max memory usage dropped from 125MB to 105MB after avoiding the extra copies. Taking generic overhead into account it may produce greater benefits for the large clusters.
Related issue #2637