v1.14 Backports 2023-09-12 #28095

gandro · 2023-09-12T11:43:28Z

PRs skipped due to conflicts:

Once this PR is merged, you can update the PR labels via:

for pr in 27911 27908 27913 27693 27889 27220 27931 27855 27918 27968 27963 27805 27792 27712 27985; do contrib/backporting/set-labels.py $pr done 1.14; done

or with

make add-labels BRANCH=v1.14 ISSUES=27911,27908,27913,27693,27889,27220,27931,27855,27918,27968,27963,27805,27792,27712,27985

aanm

LGTM for my commit. Thank you

[ upstream commit 0da3f7e ] Currently, the feature is not used, and its test is blocking the LVH upgrade [1][2]. Let's remove the test case. Once it is in use, we should rethink the testing approach (either implement as a BPF unit test, or an advanced CLI connectivity test). [1]: #27599 [2]: #27688 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

…erlay [ upstream commit 334f7f0 ] #22333 fixed a bug for configs with tunnel-routing and per-EP routes. Here ingress policy was applied twice: first via tail-call, and then a second time by the to-container program as the packet traverses the veth pair. The fix was to avoid the tail-call, and only apply policy with the to-container program. But the tail-call also contains a kube-proxy workaround (potential service replies need to pass through kube-proxy for RevDNAT, so the tail-call punts them to the stack instead of calling redirect_ep() to forward them straight to the endpoint). So we copied that workaround into the l3_local_delivery() path. The tail-call is compiled as part of bpf_lxc, and thus couldn't easily tell if a packet was received from the tunnel. But as l3_local_delivery() is inlined into bpf_overlay, we can now limit the work-around to IS_BPF_OVERLAY. This ensures that the workaround is not applied to eg. plain pod-to-pod traffic, where bpf_lxc also calls l3_local_delivery(). Fixes: 3d2ceaf ("bpf: Preserve overlay->lxc path with kube-proxy") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 7664f96 ] Signed-off-by: Su Fei <sofat1989@126.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 96ea098 ] The UpdateController method performs either a creation, or updating of a controller. If an update is performed the controller is immediately triggered. In some cases we want to create the controller once without triggering or modifying it if it already has been created. Add a CreateController method for those use cases. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

tklauser

Looks good for my change, thanks!

[ upstream commit 867e41a ] Performing a full dump of the policy map on every policy map synchronization is expensive and is only necessary to catch rare cases where the agent's view of the policy map has diverged from the kernel's view which should only happen either due to kernel or other bugs or some other application modifying the endpoint policy map. To reduce the cost of synchronizing large endpoint policy maps perform the full synchronization only periodically. The full synchronization is defaults to 15 minutes. Configurable with the hidden option "--policy-map-full-reconciliation-interval". Fixes: 9dc1350 ("endpoint: Enhance policy map sync") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

…anic [ upstream commit 2f6a2ed ] If the event type is unset, or the flow is nil, it's possible that FlowProtocol could panic. To avoid this, use the GetType() helper method which is nil safe. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 5da3348 ] https://github.com/cilium/cilium/issue/26944 has been resolved. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 5f88bbf ] This commit introduces documentation for "fast" development targets in Cilium. The "fast" targets allow for quicker development workflows by building Cilium locally and deploying it on an existing kind cluster. This approach eliminates the need to create and deploy Docker images each time a source code change is made. Additionally, the documentation has been updated to include these new Makefile targets and provides a quick start guide with essential information for working with Cilium. When a cluster is already running with a Cilium container, we can simply volume mount the binaries into the container and restart the pod to pick up the changes. A similar approach can be used for C code, with the caveat that the bpf/lib directory needs write permissions by Cilium to generate the C Header features file, making it impossible to use the volume as read-only. Since all nodes share the same kernel features, there won't be any conflicts when multiple nodes write to the same volume mount. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 762d8c9 ] - Updates the GetIPv6AllocCIDRs() to check the length of IPv6SecondaryAllocCIDRs instead of IPv4SecondaryAllocCIDRs. - Adds test cases for GetIPv4AllocCIDRs() and GetIPv6AllocCIDRs() methods. Fixes #27836 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 05bdc1e ] The current way of subscribing to node events is not compatible with Clustermesh (see #25794). Reflect this in the documentation. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit afd8e4e ] Before this change, only IPs from the default pool are reported in `cilium status --verbose` in the "Allocated addresses" section when using multi-pool IPAM mode. With this change, all IPs from non-default pools are reported as well by making use of the fact that the pool name is prepended to the IP in `(*multiPoolManager).dump`. Example: ``` % ks exec -it cilium-n8xw7 -- cilium status --verbose [...] IPAM: IPv4: 2 IPAM pool(s) available, Allocated addresses: 10.10.0.46 (router) 10.10.0.48 (kube-system/coredns-5d78c9869d-d4nts [restored]) 10.10.0.61 (kube-system/hubble-relay-597b7bdff8-pwl69 [restored]) 10.10.0.62 (health) jupiter/192.168.0.15 (default/nginx-jupiter) [...] ``` Note that IPs from non-default pools (in this case the `jupiter` pool) are reported with the pool name prefixed. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 4bb74a7 ] The IPAM allocator must not allocate new CIDRs before restoration has finished, i.e. before the K8s CiliumNode cache has synced and we have observed all nodes. Before this commit, we already started a controller attempting to allocate new CIDRs even if the allocator was not yet ready. This led to the controller being run unnecessarily, as it cannot succeed until `Resync` is called. Creating a controller before `Resync` is called is also not needed, because `Resync` itself (re-)creates a controller for each pending node. Therefore, this commit changes the logic to not start any controller before `restoreFinished` is true, as the controller will be created by `Resync` once everything is ready. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 525b1ea ] This commit fixes an issue where the multi-pool allocator was unable to update a CiliumNode resource because of concurrent writes. This manifested in the following error being emitted repeatedly: ``` level=debug msg="Controller run failed" consecutiveErrors=48 error="failed to update spec: Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-worker\": the object has been modified; please apply your changes to the latest version and try again" name=ipam-multi-pool-sync-kind-worker subsys=controller uuid=12ba9a52-d36f-48fe-a7b7-3cf97c2cdb26 ``` This would happen because the operator CiliumNode watcher does not call the `Upsert` function if only the metadata (e.g. resource version, labels, annotations, etc) of a node changes. This meant that the allocator was working with a stale `resourceVersion` of the CiliumNode object, causing any updates to fail until `Upsert` would be called again because some non-metadata field changed. This commit fixes that issue by having the controller fetch the most recent version of the `CiliumNode` if the Kubernetes API reports that there have been concurrent changes. This behavior matches the behavior of the cluster-pool and ENI/Azure/AlibabaCloud implementation, which already correctly fetched the resource again upon conflicts. In addition, this commit also adds a unit test to test this new behavior. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 86a6ab9 ] This commit address the case in which `selectorManager.GetSelections` is called before `selectorManager.updateSelections` is called, which can cause a panic due to a nil pointer dereference in case the internal `selectorManager.selections` field is nil. To address this case, if the `selectorManager` has a nil value for its internal field `selections`, then `emptySelection` is returned. The reason `GetSelections` cannot modify the internal `selections` field if it sees that it is nil is because `selectorManager.setSelections` requires the managing `SelectorCache.mutex` to be locked. This cannot be guaranteed inside a call to `selectorManager.GetSelections`, which does not require locking. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit e733b8a ] Fixes: #27717 Signed-off-by: Marek Chodor <mchodor@google.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 258e87c ] To reduce the duplication between the two paragraphs of hubble intro doc. Co-authored-by: Anna Kapuścińska <ania0102@gmail.com> Signed-off-by: Vipul Singh <vipul21sept@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2023-09-12T12:12:01Z

/test-backport-1.14

Edit:

ConformanceGKE failed with infra error (Insufficient regional quota to satisfy request)

brb

Thanks!

joamaki · 2023-09-12T13:38:14Z

@gandro hey, I had a small mistake in #27693., would it be possible to include #27985 as well in this? Alternatively could drop #27693 from this and mark it backport/author and I'll do it.

kaworu

Hubble metrics changes LGTM

gandro · 2023-09-12T14:51:35Z

@gandro hey, I had a small mistake in #27693., would it be possible to include #27985 as well in this? Alternatively could drop #27693 from this and mark it backport/author and I'll do it.

Seems simple enough, I'll add it to this PR, no worries. For 1.12 and 1.13 I'll assume you'll do it as part of the backport of #27693 though, so I'm adding the backport/author label for v1.12 and v1.13 (since the two are dependent it seems)

[ upstream commit a883ef3 ] The last minute change in 867e41a to add an option for configuring the full endpoint policy sync forgot to actually change the use-site. Fixes: 867e41a ("endpoint: Only perform full synchronization periodically") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2023-09-12T14:54:12Z

Added #27985

gandro · 2023-09-12T15:33:14Z

/test-backport-1.14

Edit:

ci-e2e failed with CI: Multicluster: client-egress-l7-named-port failed #27672 and #CI: Cilium Conformance E2E: allow-all-with-metrics-check #27194

learnitall

Thanks!

gandro · 2023-09-14T08:57:18Z

@joamaki Could you re-review? Thanks

chancez

LGTM for backport of #27889

gandro · 2023-09-25T08:09:56Z

Commits for remaining reviewers only had minor or no conflicts at all. Merging this, to unblock the v1.14 branch.

gandro added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Sep 12, 2023

gandro requested review from julianwiedmann, joamaki, aanm, danehans, tklauser, learnitall, brb and chancez September 12, 2023 11:43

gandro marked this pull request as ready for review September 12, 2023 11:45

gandro requested review from a team as code owners September 12, 2023 11:45

gandro requested a review from kaworu September 12, 2023 11:45

julianwiedmann approved these changes Sep 12, 2023

View reviewed changes

aanm approved these changes Sep 12, 2023

View reviewed changes

brb and others added 4 commits September 12, 2023 13:48

pull skb data in cil_from_netdev path for HIGH_SCALE_IPCACHE mode

1d773ea

[ upstream commit 7664f96 ] Signed-off-by: Su Fei <sofat1989@126.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

tklauser approved these changes Sep 12, 2023

View reviewed changes

joamaki and others added 7 commits September 12, 2023 13:49

ci-ipsec-upgrade: Enable IPv6

a4c9b49

[ upstream commit 5da3348 ] https://github.com/cilium/cilium/issue/26944 has been resolved. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro and others added 5 commits September 12, 2023 13:49

Fix metrics labeling with only directed source/destination context.

d5d34a8

[ upstream commit e733b8a ] Fixes: #27717 Signed-off-by: Marek Chodor <mchodor@google.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro force-pushed the pr/v1.14-backport-2023-09-12 branch from 4147dc2 to 9beeb86 Compare September 12, 2023 11:49

brb approved these changes Sep 12, 2023

View reviewed changes

kaworu approved these changes Sep 12, 2023

View reviewed changes

learnitall approved these changes Sep 12, 2023

View reviewed changes

gandro added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Sep 13, 2023

marqc approved these changes Sep 13, 2023

View reviewed changes

gandro removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Sep 13, 2023

chancez approved these changes Sep 18, 2023

View reviewed changes

gandro merged commit eaf26dc into v1.14 Sep 25, 2023
195 checks passed

gandro deleted the pr/v1.14-backport-2023-09-12 branch September 25, 2023 08:10

This was referenced Oct 17, 2023

policy: Move getNets to selector cache #27670

Merged

Prepare for release v1.14.3 #28680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14 Backports 2023-09-12 #28095

v1.14 Backports 2023-09-12 #28095

gandro commented Sep 12, 2023 •

edited by chancez

aanm left a comment

tklauser left a comment

gandro commented Sep 12, 2023 •

edited

brb left a comment

joamaki commented Sep 12, 2023 •

edited

kaworu left a comment

gandro commented Sep 12, 2023

gandro commented Sep 12, 2023

gandro commented Sep 12, 2023 •

edited

learnitall left a comment

gandro commented Sep 14, 2023

chancez left a comment

gandro commented Sep 25, 2023

v1.14 Backports 2023-09-12 #28095

v1.14 Backports 2023-09-12 #28095

Conversation

gandro commented Sep 12, 2023 • edited by chancez

aanm left a comment

Choose a reason for hiding this comment

tklauser left a comment

Choose a reason for hiding this comment

gandro commented Sep 12, 2023 • edited

brb left a comment

Choose a reason for hiding this comment

joamaki commented Sep 12, 2023 • edited

kaworu left a comment

Choose a reason for hiding this comment

gandro commented Sep 12, 2023

gandro commented Sep 12, 2023

gandro commented Sep 12, 2023 • edited

learnitall left a comment

Choose a reason for hiding this comment

gandro commented Sep 14, 2023

chancez left a comment

Choose a reason for hiding this comment

gandro commented Sep 25, 2023

gandro commented Sep 12, 2023 •

edited by chancez

gandro commented Sep 12, 2023 •

edited

joamaki commented Sep 12, 2023 •

edited

gandro commented Sep 12, 2023 •

edited