-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare annotations before discarding CiliumNode updates. #25465
Compare annotations before discarding CiliumNode updates. #25465
Conversation
Commit 0fc487f01be34f75bf5ae6ad24e2f4b8d6753821 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Hi @aojea, this is the fix for Wiregurad tunnel drop issue. The wireguard tunnel is broken when any node restart happens. After restart, the restarted node's public key got refreshed but not propagated to other nodes. The issue is caused by cilium dropping CiliumNode update events when the spec, status and labels between the old and new nodes are the same. Wireguard public key updates are transmitted through annotations. We want to confirm if this is a known issue since it can be reproduced by restarting any node easily. Thanks! |
@@ -69,7 +69,8 @@ func (k *K8sWatcher) ciliumNodeInit(ciliumNPClient client.Clientset, asyncContro | |||
valid = true | |||
isLocal := k8s.IsLocalCiliumNode(ciliumNode) | |||
if oldCN.DeepEqual(ciliumNode) && | |||
comparator.MapStringEquals(oldCN.ObjectMeta.Labels, ciliumNode.ObjectMeta.Labels) { | |||
comparator.MapStringEquals(oldCN.ObjectMeta.Labels, ciliumNode.ObjectMeta.Labels) && | |||
comparator.MapStringEquals(oldCN.ObjectMeta.Annotations, ciliumNode.ObjectMeta.Annotations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes issue in case the wireguard key changes but the rest of the node object not
cilium/pkg/nodediscovery/nodediscovery.go
Lines 460 to 465 in e52fe1d
if pk := n.localNode.WireguardPubKey; pk != "" { | |
if nodeResource.ObjectMeta.Annotations == nil { | |
nodeResource.ObjectMeta.Annotations = make(map[string]string) | |
} | |
nodeResource.ObjectMeta.Annotations[annotation.WireguardPubKey] = pk | |
} |
since, we don't compare Annotations the event is dropped and not processed
0fc487f
to
d7db202
Compare
d7db202
to
b495ea8
Compare
Thanks for the PR 😄 I wonder if instead of checking all annotation changes, it might make more sense to target the specific wireguard key annotation and just see if that's changed. It seems like the intention is to avoid unnecessary node updates. |
in the long term my experience is that new annotations got added but you never remember to update this comparison, it is also coherent with existing comparison with labels |
b495ea8
to
52942f6
Compare
Hi @tommyp1ckles, I agree with @aojea's point here. Is there a specific concern about checking all annotation changes? |
/test |
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2380/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
Thanks @christarazi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @aojea's point here. Is there a specific concern about checking all annotation changes?
Nothing specific - mostly just wondering if it would lead to unnecessary updates. This seems reasonable to me.
Hi @brb, I've updated the commit message, can you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next |
Thanks @christarazi @brb @tommyp1ckles @aojea, I've got enough approvals, and the "k8s-1.26-kernel-net-next" test seems not related to this PR. Can you please take another look and see how I can unblock the merging? |
1d614f3
to
d38b6b6
Compare
Currently we discard CiliumNode updates based on DeepEqual and labels. However DeepEqual is set to ignore Annotations, and the wg-pub-key annotation is used to exchange rotated Wireguard keys. The wireguard tunnel is broken when any node restart happens. After restart, the restarted node's public key got refreshed but not propagated to other nodes. The issue is caused by cilium dropping CiliumNode update events when the spec, status and labels between the old and new nodes are the same. Wireguard public key updates are transmitted through annotations. Signed-off-by: Lin Dong <lindongchn@gmail.com>
d38b6b6
to
975a0d4
Compare
/test |
I've rebased to re-run the tests. Please don't push unless the CI points out a failure and you need to push to address it. |
Got it. Thanks @christarazi! |
Thanks for the PR and your patience! |
Thanks so much @christarazi! |
Currently we discard CiliumNode updates based on DeepEqual and labels. However DeepEqual is set to ignore Annotations, and the wg-pub-key annotation is used to exchange rotated Wireguard keys.