Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 Backports 2023-05-22 #25588

Merged
merged 4 commits into from
May 26, 2023
Merged

v1.13 Backports 2023-05-22 #25588

merged 4 commits into from
May 26, 2023

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented May 22, 2023

Once this PR is merged, you can update the PR labels via:

for pr in 25323 25461 25426 25465; do contrib/backporting/set-labels.py $pr done 1.13; done

or with

make add-labels BRANCH=v1.13 ISSUES=25323,25461,25426,25465

@tklauser tklauser requested a review from a team as a code owner May 22, 2023 14:56
@tklauser tklauser added kind/backports This PR provides functionality previously merged into master. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels May 22, 2023
@tklauser
Copy link
Member Author

@pchaigno could you please review the backport of #25426 given you've reviewed that PR?

@tklauser tklauser requested a review from pchaigno May 22, 2023 14:57
@tklauser tklauser force-pushed the pr/v1.13-backport-2023-05-22 branch from a74a66d to d035452 Compare May 22, 2023 15:00
@tklauser
Copy link
Member Author

/test-backport-1.13

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the first three backports and they look good. Thanks a lot Tobias!

@pchaigno pchaigno removed the request for review from tommyp1ckles May 22, 2023 15:14
tommyp1ckles and others added 4 commits May 24, 2023 15:21
[ upstream commit 439a0a0 ]

First see the code comments for the full explanation.

This issue with the faulty conntrack entries when enforcing host
policies is suspected to cause the flakes that have been polluting host
firewall tests. We've seen this faulty conntrack issue happen mostly to
health and kube-apiserver connections. And it turns out that the host
firewall flakes look like they are caused by connectivity blips on
kube-apiserver's side, which error messages such as:

    error: unable to upgrade connection: Authorization error (user=kube-apiserver-kubelet-client, verb=create, resource=nodes, subresource=proxy)

This commit therefore tries to workaround the issue of faulty conntrack
entries in host firewall tests. If the flakes are indeed caused by those
faulty entries, we shouldn't see them happen anymore.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 0cfce97 ]

Change 439a0a0 introduced workaround to common flake we've been seeing relating to issue #15455.

Any test enabling hostfw/host-policy will may suffer from the same issue.

Addresses: #25411

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b2de07a ]

This is largely possible because a macro-defined func is reused in
several places, and send_drop_notify is assumed to be deferred up the
chain - but not all parents would actually invoke it, and by definition
it seems clearer/less error prone to always explicitly issue drop
notification at the source, where the drop is decided.

This is a smidge more verbose, but it avoids the problem of bad
assumptions or hard-to-catch mistakes causing missing drop notifications.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 76eca78 ]

Currently we discard CiliumNode updates based on DeepEqual and labels. However DeepEqual is set to ignore Annotations, and the wg-pub-key annotation is used to exchange rotated Wireguard keys.

The wireguard tunnel is broken when any node restart happens. After restart, the restarted node's public key got refreshed but not propagated to other nodes.

The issue is caused by cilium dropping CiliumNode update events when the spec, status and labels between the old and new nodes are the same. Wireguard public key updates are transmitted through annotations.

Signed-off-by: Lin Dong <lindongchn@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@tklauser tklauser force-pushed the pr/v1.13-backport-2023-05-22 branch from d035452 to 3f20a57 Compare May 24, 2023 13:21
@tklauser
Copy link
Member Author

tklauser commented May 24, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Policy  cannot be deleted

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/141/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Policy  cannot be deleted

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/170/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Policy  cannot be deleted

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/213/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@tklauser
Copy link
Member Author

/test-1.26-net-next

@tklauser
Copy link
Member Author

net-next failure is #25524 which is going to be quarantined on main by #25670. I've added the needs-backport/1.13 there. Marking as ready-to-merge given that all other tests passed and reviews are in.

@tklauser tklauser added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2023
@squeed squeed merged commit d3e750d into v1.13 May 26, 2023
61 of 62 checks passed
@squeed squeed deleted the pr/v1.13-backport-2023-05-22 branch May 26, 2023 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants