-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identities: fix incorrectly overwriting the labels of kube-apiserver IPs #25241
identities: fix incorrectly overwriting the labels of kube-apiserver IPs #25241
Conversation
The label injection controller is responsible for iterating through the ipcache.IdentityMetadata map, potentially allocating new identities in case of label changes. The new identity can be enriched with additional labels besides the ones present in the metadata map. This happens, in particular, when the identity to be allocated corresponds to the IP of kube-apiserver, and it does not belong to any of the cluster nodes (e.g., in cloud-provider environments). In this case, the additional labels represent the CIDR itself and all broader CIDRs which include it. In such scenario, there is currently a possible race condition, which possibly causes the additional CIDR labels to be disregarded, leading to an incorrect enforcement of ToCIDR policies matching the IPs of the kube-apiserver (hence, the corresponding traffic is incorrectly dropped). When this issue occurs, the policy selector for a broader CIDR does not include the identities corresponding to the kube-apiserver IP addresses, and affected agents report a warning message along the lines of (all intermediate CIDR labels are omitted for brevity): level=warning msg="UpdateIdentities: Updating an existing identity" identity=16777218 labels="[cidr:0.0.0.0/0 ... cidr:10.2.0.6/32 reserved:kube-apiserver reserved:world]" labels(new)="[reserved:kube-apiserver]" subsys=policy Differently, the warning message is benign in case the contents of "labels" and "labels(new)" are swapped (i.e., "labels(new)" includes all CIDR labels in addition to the kube-apiserver reserved one). More in detail, the race condition is the following: * The prefix corresponding to the IP address of the kube-apiserver is processed by ipc.InjectLabels; in terms of metadata, it is associated with the reserved:kube-apiserver label only [1]. * A new local identity gets allocated for the given prefix, after enriching the labels with the ones representing CIDR information (ipc.injectLabelsForCIDR) [2]. The identity creation causes an event to be written to the event channel [3] (with the new ID and the full list of labels), asynchronously triggering sc.UpdateIdentities [4]. * Once all identities have been allocated, ipc.InjectLabels calls ipc.UpdatePolicyMaps with the set of updated identities, and the labels retrieved from the metadata [5,6]. Under the hood, this causes sc.UpdateIdentities to be executed. If this happens after the asynchronous execution triggered by the update above, the current labels (i.e., reserved:kube-apiserver only) overwrite the full list previously configured, causing the issue. Conversely, if the order is reversed, all labels are eventually present). The issue is fixed propagating the enriched labels associated with the newly generated identity to ipc.UpdatePolicyMaps, rather than only the ones present as part of the metadata. [1]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/ipcache/metadata.go#L198 [2]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/ipcache/metadata.go#L431-L432 [3]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/identity/cache/local.go#L108-L112 [4]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/identity/cache/cache.go#L187 [5]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/ipcache/metadata.go#L232 [6]: https://github.com/cilium/cilium/blob/097fba3182f0f67fe0c6acab9cf5f1a500af36d2/pkg/ipcache/metadata.go#L266 Fixes: cilium#24401 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
/test-backport-1.13 Job 'Cilium-PR-K8s-1.24-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.24-kernel-4.19/31/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-kernel-4.19/1247/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
LGTM, given that this is a |
Sure, I've updated the release note to better highlight the user impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks! As tophat I'll be happy to merge this after CI is triaged and proven not to be breaking due to the change (ideally achieved by re-running those tests to demonstrate they don't reliably fail).
/mlh new-flake Cilium-PR-K8s-1.24-kernel-4.19 👍 created #25255 |
/test-1.24-4.19 |
test-1.16-4.19: hit #22749 |
/test-1.16-4.19 |
i'm guessing |
Thanks @michi-covalent for triaging the tests! They all passed now |
is the button green? no. but this is an exceptional case since |
@giorio94 please use the usual backporting processes when preparing backport commits like this. In particular this ensures:
For your reference, the backporting guide is here: https://docs.cilium.io/en/latest/contributing/release/backports/#backporting-guide-for-the-backporter |
Ah, I may have spoke a bit early if this fix was only targeted at the 1.13 branch.. in that case this is expected. The vast majority of fixes land on main first, but in this case it looks like there is no corresponding commit on the main branch. Sorry about the noise. |
Yeah, this change targeted only v1.13, since the same logic in the main branch had already been refactored, fixing at the same time this bug. This commit modifies only the very specific bits required to fix the issue, to avoid back porting the full refactoring to v1.13. |
This PR fixes the propagation of the identity labels associated with the kube-apiserver IPs (if they are external to the cluster, as in cloud environments) to the selector cache, to ensure that the ones representing the CIDR itself and the enclosing ones are not overwritten. That could cause traffic towards the API server to be incorrectly dropped when it should be allowed by
ToCIDR
network policy entries. This bug affects only v1.13, since the logic in the main branch has been refactored, and (among others) it already includes the changes proposed here.It additionally fixes the occurrence of the warning log reported in #24401, since the kube-apiserver identity labels are now consistent across different invocations of
selectorcache.UpdateIdentities
.Fixes: #24401