Use metadata only watches for configmaps and secrets resources on workload clusters #3887

akutz · 2020-10-29T21:05:15Z

User Story

As an operator,
I would like CAPI controllers to use metadata-only watches for secrets resources on the management cluster,
Because otherwise CAPI has trouble scaling to large environments with hundreds/thousands of workload clusters.

As an operator,
I would like CAPI controllers to use metadata-only watches for configmaps and secrets resources on workload clusters,
Because otherwise CAPI has trouble scaling to large environments with hundreds/thousands of workload clusters.

As an operator,
I would like CAPI controllers to use metadata-only watches for configmaps and secrets resources on workload clusters,
Because otherwise anyone on a workload cluster has the ability to force an OOM error in the CABPK and KCP pods on the management server by creating enough configmaps or secrets resources to exceed the memory limit of those pods on the management cluster.

Detailed Description

Today Cluster API watches (1) many different resources, but here are some that are particularly relevant to this issue:

Resource Type	Resource Origin	Controller	Controller Location	Reason
`configmaps`	Management Cluster	CABPK	Management Cluster	Control plane init mutex
		CRS		react to changes for applying data into workload cluster
	Workload Cluster(s)	KCP		CoreDNS
		KCP		Kubelet config
		KCP		Kubeadm config
`nodes`		KCP		Getting control plane nodes
		Machine		Finding node ref
		MachinePool		Finding node ref
`secrets`	ManagementCluster	CABPK		spec.files from secret ref
		CRS		react to changes for applying data into workload cluster
	Workload Cluster(s)	CAPBK		Bootstrap token refresh

These resources are important, especially the configmaps and secrets. A secrets resource is created on the management cluster by CAPBK for every machine in a workload cluster. In CAPI v1alpha2, before CAPBK, this resource was around 16KiB in size. Today it is 45KiB. That means for 300 guest clusters with three nodes each, secrets resources alone will consume ~40MiB in each cache for a controller that is watching, explicitly or implicitly, secrets resources.

The situation with configmaps is even more concerning. Because CAPI, from the management cluster, watches and caches resources from all workload clusters, it means at best it is simply not possible to accurately estimate the memory resource requirements for CAPI pods. At worst it means a rogue workload cluster administrator could willfully and maliciously ensure that CAPI pods running on the management cluster are terminated with OOM errors by creating configmaps and secrets resources on a workload cluster.

While it will not solve any of these issues completely (because of labels and annotations), CAPI should try to adopt metadata-only watches (kubernetes-sigs/controller-runtime#1174) for configmaps and secrets resources, opting to do live reads where necessary. This is the minimum CAPI should consider, but in the future consider extending this strategy to include additional resources as well.

Anything else you would like to add:

/kind feature

These may not be explicit watches. Per Support APIReader/NoCache option for ControllerUtil.CreateOr(Update|Patch) functions to avoid OOM errors controller-runtime#1222, the controller-runtime client sets up an implicit watch for the type of resource being fetched as part of a Get or List call.

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2020-10-30T08:31:00Z

/milestone v0.4.0

fabriziopandini · 2020-10-30T08:32:36Z

backporting will be nice, but it requires some work in older versions of controller runtime

ncdc · 2020-10-30T20:44:35Z

Other possible things to consider:

We could not use a caching client for remote clusters from ClusterCacheTracker, or make it configurable
We could scope a remote cluster cache to a specific namespace, so we only cache items in a single namespace

akutz · 2020-10-30T20:49:00Z

We could scope a remote cluster cache to a specific namespace, so we only cache items in a single namespace

This is interesting. It would be really nice if we could create some sort of caching intermediary, like a squid proxy for the API server, so that the cache for remote clusters would live in the same namespace as the objects that represent those clusters.

fabriziopandini · 2021-01-19T14:21:51Z

@vincepri is there some work missing for this issue after #4023 is merged?
Should we backport/is it possible to backport given that this change is deeply linked to the controller runtime version?

vincepri · 2021-01-19T17:16:14Z

This has been fixed by #4023 and #3986 for v0.3.x

/close

k8s-ci-robot · 2021-01-19T17:16:24Z

@vincepri: Closing this issue.

In response to this:

This has been fixed by #4023 and #3986 for v0.3.x

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 29, 2020

k8s-ci-robot added this to the v0.4.0 milestone Oct 30, 2020

ncdc mentioned this issue Nov 2, 2020

Add a method to ClusterCacheTracker that returns a live (non-caching) client #3891

Closed

k8s-ci-robot closed this as completed Jan 19, 2021

wallrj mentioned this issue Dec 20, 2022

Design: reduce cert-manager controller's memory consumption cert-manager/cert-manager#5639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use metadata only watches for configmaps and secrets resources on workload clusters #3887

Use metadata only watches for configmaps and secrets resources on workload clusters #3887

akutz commented Oct 29, 2020 •

edited by ncdc

fabriziopandini commented Oct 30, 2020

fabriziopandini commented Oct 30, 2020

ncdc commented Oct 30, 2020

akutz commented Oct 30, 2020

fabriziopandini commented Jan 19, 2021

vincepri commented Jan 19, 2021

k8s-ci-robot commented Jan 19, 2021

Use metadata only watches for configmaps and secrets resources on workload clusters #3887

Use metadata only watches for configmaps and secrets resources on workload clusters #3887

Comments

akutz commented Oct 29, 2020 • edited by ncdc

fabriziopandini commented Oct 30, 2020

fabriziopandini commented Oct 30, 2020

ncdc commented Oct 30, 2020

akutz commented Oct 30, 2020

fabriziopandini commented Jan 19, 2021

vincepri commented Jan 19, 2021

k8s-ci-robot commented Jan 19, 2021

akutz commented Oct 29, 2020 •

edited by ncdc