Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption #2725

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Reduce memory consumption #2725

wants to merge 1 commit into from

Conversation

r2k1
Copy link
Contributor

@r2k1 r2k1 commented May 1, 2024

Opencost stores Kubernetes objects (Pod, ReplicaSet, Deployment, DaemonSet, etc.) with all their fields in memory.

For every request to the /metrics endpoint, a copy of these objects is created, which can lead to significant memory usage, particularly for large clusters.

I suspect this is to prevent accidental modifications, but how likely is that? I think there should be no reason to modify the k8s objects.

I ran some tests on a single-node cluster, creating 300 deployments. Max memory usage dropped from 125MB to 105MB after avoiding the extra copies. Taking generic overhead into account it may produce greater benefits for the large clusters.

Related issue #2637

this create a copy of every kubernetes object for every /metrics call

Signed-off-by: r2k1 <yokree@gmail.com>
Copy link

vercel bot commented May 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
opencost ✅ Ready (Inspect) Visit Preview May 1, 2024 10:56pm

Copy link

sonarcloud bot commented May 1, 2024

@r2k1
Copy link
Contributor Author

r2k1 commented May 1, 2024

Additional Notes:

kube-state-metrics employs certain strategies to optimize memory usage:

  • Instead of Informers, it uses Reflector instead of Informer, allowing a direct transformation from a Kubernetes object to a Prometheus string format, retaining the object in memory only momentarily during unmarshalling.
  • It has a custom method to serve metrics since the Prometheus Gatherer can be inefficient.
    More details can be found here.

@AjayTripathy
Copy link
Contributor

Thanks @r2k1 . We'll get this reviewed.

}

return cloneList
return c.indexer.List()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// since the indexer returns the as-is pointer to the resource,
// we deep copy the resources such that callers don't corrupt the
// index

The comment explained why we don't do this. I know we've had problems with this in the past. The indexer.List() will return a slice containing pointers to the resources directly in the index. The reason we deep copy is to avoid corrupting the index or running into concurrent access problems (This is specifically problematic when you have the index attempting to update an existing resource's Labels while someone is reading the Labels). There are lots of cases where we are inspecting these objects live.

This can run fine in a cluster without ever hitting an issue, but there are edge cases which can create really hard-to-debug scenarios.

@mbolt35
Copy link
Contributor

mbolt35 commented May 29, 2024

@r2k1

I suspect this is to prevent accidental modifications, but how likely is that? I think there should be no reason to modify the k8s objects.

First of all, I will be the first to admit the following seems very reassuring:

// List is completely threadsafe as long as you treat all items as immutable.

I couldn't remember all the details, so I dug into our slack concerning the issues we were having at the time. It's totally possible that these issues are actually no longer a problem (this is from 4 years ago).


We were getting this panic:

goroutine 2751 [running]:
runtime.throw(0x211f776, 0x21)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc000020bb0 sp=0xc000020b80 pc=0x434a82
runtime.mapaccess1_faststr(0x1ee49e0, 0xc006e03710, 0xc0014ee77c, 0x3, 0x1)
        /usr/local/go/src/runtime/map_faststr.go:21 +0x43c fp=0xc000020c20 sp=0xc000020bb0 pc=0x412ebc
k8s.io/apimachinery/pkg/labels.Set.Get(0xc006e03710, 0xc0014ee77c, 0x3, 0xc001493601, 0x7)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913075812-e119e5e154b6/pkg/labels/labels.go:57 +0x4b fp=0xc000020c58 sp=0xc000020c20 pc=0x7f470b
k8s.io/apimachinery/pkg/labels.(*Requirement).Matches(0xc0014d29c0, 0x24f88a0, 0xc006e03710, 0x3c8fd600)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913075812-e119e5e154b6/pkg/labels/selector.go:198 +0xca fp=0xc000020d50 sp=0xc000020c58 pc=0x7f556a
k8s.io/apimachinery/pkg/labels.internalSelector.Matches(0xc0014d29c0, 0x1, 0x1, 0x24f88a0, 0xc006e03710, 0xc0477a9e00)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913075812-e119e5e154b6/pkg/labels/selector.go:342 +0x62 fp=0xc000020d88 sp=0xc000020d50 pc=0x7f69e2
k8s.io/apimachinery/pkg/labels.(*internalSelector).Matches(0xc0427a3e00, 0x24f88a0, 0xc006e03710, 0x0)
        <autogenerated>:1 +0x62 fp=0xc000020dc8 sp=0xc000020d88 pc=0x7fa252
github.com/kubecost/cost-model/pkg/costmodel.getPodDeployments(0x2549da0, 0xc00097a000, 0xc04778a000, 0x6a4, 0x900, 0xc00004402b, 0x11, 0x0, 0x0, 0x0)
        /app/cost-model/pkg/costmodel/costmodel.go:1472 +0x106 fp=0xc000020f00 sp=0xc000020dc8 pc=0x19cdde6
github.com/kubecost/cost-model/pkg/costmodel.(*CostModel).costDataRange.func21(0xc01960e1a0, 0xc007937840, 0xc04778a000, 0x6a4, 0x900, 0xc00004402b, 0x11, 0xc04135a038, 0xc009b06990, 0xc04135a040, ...)
        /app/cost-model/pkg/costmodel/costmodel.go:1964 +0xaa fp=0xc000020f80 sp=0xc000020f00 pc=0x19f915a
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000020f88 sp=0xc000020f80 pc=0x466c21
created by github.com/kubecost/cost-model/pkg/costmodel.(*CostModel).costDataRange
        /app/cost-model/pkg/costmodel/costmodel.go:1961 +0x2a7f

The function being called at the time was:

        go func() {
		defer wg.Done()

		podDeploymentsMapping, k8sErr = getPodDeployments(cm.Cache, podlist, clusterID)
		if k8sErr != nil {
			return
		}

		podStatefulsetsMapping, k8sErr = getPodStatefulsets(cm.Cache, podlist, clusterID)
		if k8sErr != nil {
			return
		}

		podServicesMapping, k8sErr = getPodServices(cm.Cache, podlist, clusterID)
		if k8sErr != nil {
			return
		}
		namespaceLabelsMapping, k8sErr = getNamespaceLabels(cm.Cache, clusterID)
		if k8sErr != nil {
			return
		}
	}()

[6:56 PM] podDeploymentsMapping, k8sErr = getPodDeployments(cm.Cache, podlist, clusterID)
[6:56 PM] is the specific line
[6:56 PM] I was a bit perplexed by what was throwing the panic though.. it was the go runtime
[6:56 PM] /usr/local/go/src/runtime/map_faststr.go:21 +0x43c fp=0xc000020c20 sp=0xc000020bb0 pc=0x412ebc
7:40
[6:57 PM] Easy enough to find: https://golang.org/src/runtime/map_faststr.go
[6:57 PM] line #21:

    20      if h.flags&hashWriting != 0 {
    21  		throw("concurrent map read and map write")
    22  	}

[6:58 PM] so, if you look at what we're reading, it's a selector.Match (edited)
[6:58 PM] which is 100% read-only...
[6:58 PM] so this is where it gets interesting...
[6:59 PM] If you look at the specific map[string]string that is being concurrently r/w
[7:00 PM] it's part of a *v1.Pod that comes from the cluster cache. So immediately I started to piece this together. The specific case is that the cache index updates the label map for a pod at the exact moment we're reading it in our go routine
[7:01 PM] I had thought about this when I added the watch controller
[7:01 PM] but the reason I didn't add "handling" for it is because I stopped when I read the kubernetes cache indexer storage docs (edited)
[7:01 PM]

// ThreadSafeStore is an interface that allows concurrent access to a storage backend.
// TL;DR caveats: you must not modify anything returned by Get or List as it will break
// the indexing feature in addition to not being thread safe.
//
// The guarantees of thread safety provided by List/Get are only valid if the caller
// treats returned items as read-only. For example, a pointer inserted in the store
// through `Add` will be returned as is by `Get`. Multiple clients might invoke `Get`
// on the same key and modify the pointer in a non-thread-safe way. Also note that
// modifying objects stored by the indexers (if any) will *not* automatically lead
// to a re-index. So it's not a good idea to directly modify the objects returned by
// Get/List, in general.

[7:02 PM] // The guarantees of thread safety provided by List/Get are only valid if the caller
// treats returned items as read-only.
[7:02 PM] That is 100% wrong
The "smell" here is that most mutex concealing "thread-safe" objects which can return collections
[7:04 PM] should either return 1) A copy, or 2) Supply an iteration operation API
[7:05 PM] to perform operations on each object index within the lock
[7:09 PM] anyways -- this sort of sucks because now the "fix" is to basically hold add/remove/updates in the queue via lock while we're doing a GetAll(), and have GetAll() deep copy the objects
[7:11 PM] I feel like that's just additional memory pressure

^ I clearly wasn't happy about it 😆


So, in short, it does seem like there are locks that will handle specific field modifications (or either completely replace the entire object) within the index, BUT at the time, the map[string]string Labels (likely others?) were being modified in-place which broke the thread-safety guarantee.

If they fixed this issue, I will be totally happy to sign off on this change :)

@mbolt35
Copy link
Contributor

mbolt35 commented May 29, 2024

@r2k1 I've been very frustrated with the lack of flexibility in this API for a long time, and I totally support any and all motivation to improve (or completely re-work it). Hopefully this bug is long gone in the indexer storage, and we can revert back to using List() directly. Your last PR around the controllers also got me thinking about how we could update the API to support" adding" watch controllers. That way, opencost could default to the resources it uses, and any dependents could append new watchers for resources.

I just felt like it was important to let you know that I am absolutely interested in a solution here, and I hope I'm not coming across as being gate-keepy. I really appreciate your input and help here :)

@r2k1
Copy link
Contributor Author

r2k1 commented May 29, 2024

It could be also caused by the fact currently opencost reads data from indexers directly rather than informers (not sure if it's been designed to use in such way).

https://github.com/opencost/opencost/pull/2736/files
Here I moved away from Informers and Indexers. And used old plain map with lock as a storage. No indexers == no issues with indexers. I don't understand why informers are so complex. I suspect it's smart to generate a diff between new and old object versions (which isn't used by opencost anyway)

But I think this line:

deepCopyable.DeepCopyObject()

And a few .Clone() calls in allocation calculation are doubling memory consumption. Especially on larger clusters (It may not be obvious on small clusters due to some static overhead).

Maybe a golang linter can help to avoid accidental mutability instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants