-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-09-12 #28095
v1.14 Backports 2023-09-12 #28095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my commit. Thank you
[ upstream commit 0da3f7e ] Currently, the feature is not used, and its test is blocking the LVH upgrade [1][2]. Let's remove the test case. Once it is in use, we should rethink the testing approach (either implement as a BPF unit test, or an advanced CLI connectivity test). [1]: #27599 [2]: #27688 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
…erlay [ upstream commit 334f7f0 ] #22333 fixed a bug for configs with tunnel-routing and per-EP routes. Here ingress policy was applied twice: first via tail-call, and then a second time by the to-container program as the packet traverses the veth pair. The fix was to avoid the tail-call, and only apply policy with the to-container program. But the tail-call also contains a kube-proxy workaround (potential service replies need to pass through kube-proxy for RevDNAT, so the tail-call punts them to the stack instead of calling redirect_ep() to forward them straight to the endpoint). So we copied that workaround into the l3_local_delivery() path. The tail-call is compiled as part of bpf_lxc, and thus couldn't easily tell if a packet was received from the tunnel. But as l3_local_delivery() is inlined into bpf_overlay, we can now limit the work-around to IS_BPF_OVERLAY. This ensures that the workaround is not applied to eg. plain pod-to-pod traffic, where bpf_lxc also calls l3_local_delivery(). Fixes: 3d2ceaf ("bpf: Preserve overlay->lxc path with kube-proxy") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 7664f96 ] Signed-off-by: Su Fei <sofat1989@126.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 96ea098 ] The UpdateController method performs either a creation, or updating of a controller. If an update is performed the controller is immediately triggered. In some cases we want to create the controller once without triggering or modifying it if it already has been created. Add a CreateController method for those use cases. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my change, thanks!
[ upstream commit 867e41a ] Performing a full dump of the policy map on every policy map synchronization is expensive and is only necessary to catch rare cases where the agent's view of the policy map has diverged from the kernel's view which should only happen either due to kernel or other bugs or some other application modifying the endpoint policy map. To reduce the cost of synchronizing large endpoint policy maps perform the full synchronization only periodically. The full synchronization is defaults to 15 minutes. Configurable with the hidden option "--policy-map-full-reconciliation-interval". Fixes: 9dc1350 ("endpoint: Enhance policy map sync") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
…anic [ upstream commit 2f6a2ed ] If the event type is unset, or the flow is nil, it's possible that FlowProtocol could panic. To avoid this, use the GetType() helper method which is nil safe. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 5da3348 ] https://github.com/cilium/cilium/issue/26944 has been resolved. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 5f88bbf ] This commit introduces documentation for "fast" development targets in Cilium. The "fast" targets allow for quicker development workflows by building Cilium locally and deploying it on an existing kind cluster. This approach eliminates the need to create and deploy Docker images each time a source code change is made. Additionally, the documentation has been updated to include these new Makefile targets and provides a quick start guide with essential information for working with Cilium. When a cluster is already running with a Cilium container, we can simply volume mount the binaries into the container and restart the pod to pick up the changes. A similar approach can be used for C code, with the caveat that the bpf/lib directory needs write permissions by Cilium to generate the C Header features file, making it impossible to use the volume as read-only. Since all nodes share the same kernel features, there won't be any conflicts when multiple nodes write to the same volume mount. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 762d8c9 ] - Updates the GetIPv6AllocCIDRs() to check the length of IPv6SecondaryAllocCIDRs instead of IPv4SecondaryAllocCIDRs. - Adds test cases for GetIPv4AllocCIDRs() and GetIPv6AllocCIDRs() methods. Fixes #27836 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit afd8e4e ] Before this change, only IPs from the default pool are reported in `cilium status --verbose` in the "Allocated addresses" section when using multi-pool IPAM mode. With this change, all IPs from non-default pools are reported as well by making use of the fact that the pool name is prepended to the IP in `(*multiPoolManager).dump`. Example: ``` % ks exec -it cilium-n8xw7 -- cilium status --verbose [...] IPAM: IPv4: 2 IPAM pool(s) available, Allocated addresses: 10.10.0.46 (router) 10.10.0.48 (kube-system/coredns-5d78c9869d-d4nts [restored]) 10.10.0.61 (kube-system/hubble-relay-597b7bdff8-pwl69 [restored]) 10.10.0.62 (health) jupiter/192.168.0.15 (default/nginx-jupiter) [...] ``` Note that IPs from non-default pools (in this case the `jupiter` pool) are reported with the pool name prefixed. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 4bb74a7 ] The IPAM allocator must not allocate new CIDRs before restoration has finished, i.e. before the K8s CiliumNode cache has synced and we have observed all nodes. Before this commit, we already started a controller attempting to allocate new CIDRs even if the allocator was not yet ready. This led to the controller being run unnecessarily, as it cannot succeed until `Resync` is called. Creating a controller before `Resync` is called is also not needed, because `Resync` itself (re-)creates a controller for each pending node. Therefore, this commit changes the logic to not start any controller before `restoreFinished` is true, as the controller will be created by `Resync` once everything is ready. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 525b1ea ] This commit fixes an issue where the multi-pool allocator was unable to update a CiliumNode resource because of concurrent writes. This manifested in the following error being emitted repeatedly: ``` level=debug msg="Controller run failed" consecutiveErrors=48 error="failed to update spec: Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-worker\": the object has been modified; please apply your changes to the latest version and try again" name=ipam-multi-pool-sync-kind-worker subsys=controller uuid=12ba9a52-d36f-48fe-a7b7-3cf97c2cdb26 ``` This would happen because the operator CiliumNode watcher does not call the `Upsert` function if only the metadata (e.g. resource version, labels, annotations, etc) of a node changes. This meant that the allocator was working with a stale `resourceVersion` of the CiliumNode object, causing any updates to fail until `Upsert` would be called again because some non-metadata field changed. This commit fixes that issue by having the controller fetch the most recent version of the `CiliumNode` if the Kubernetes API reports that there have been concurrent changes. This behavior matches the behavior of the cluster-pool and ENI/Azure/AlibabaCloud implementation, which already correctly fetched the resource again upon conflicts. In addition, this commit also adds a unit test to test this new behavior. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 86a6ab9 ] This commit address the case in which `selectorManager.GetSelections` is called before `selectorManager.updateSelections` is called, which can cause a panic due to a nil pointer dereference in case the internal `selectorManager.selections` field is nil. To address this case, if the `selectorManager` has a nil value for its internal field `selections`, then `emptySelection` is returned. The reason `GetSelections` cannot modify the internal `selections` field if it sees that it is nil is because `selectorManager.setSelections` requires the managing `SelectorCache.mutex` to be locked. This cannot be guaranteed inside a call to `selectorManager.GetSelections`, which does not require locking. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 258e87c ] To reduce the duplication between the two paragraphs of hubble intro doc. Co-authored-by: Anna Kapuścińska <ania0102@gmail.com> Signed-off-by: Vipul Singh <vipul21sept@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
4147dc2
to
9beeb86
Compare
/test-backport-1.14 Edit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hubble metrics changes LGTM
Seems simple enough, I'll add it to this PR, no worries. For 1.12 and 1.13 I'll assume you'll do it as part of the backport of #27693 though, so I'm adding the |
[ upstream commit a883ef3 ] The last minute change in 867e41a to add an option for configuring the full endpoint policy sync forgot to actually change the use-site. Fixes: 867e41a ("endpoint: Only perform full synchronization periodically") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Added #27985 |
/test-backport-1.14 Edit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@joamaki Could you re-review? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for backport of #27889
Commits for remaining reviewers only had minor or no conflicts at all. Merging this, to unblock the v1.14 branch. |
main-focus.yaml
daemon_main.go
due to differently namedvp
variablePRs skipped due to conflicts:
Once this PR is merged, you can update the PR labels via:
or with