Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 backports 2023-04-26 #25137

Merged
merged 7 commits into from
Apr 28, 2023

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Apr 26, 2023

Once this PR is merged, you can update the PR labels via:

$ for pr in 23953 25002 25043 24978 25024 25078 25046; do contrib/backporting/set-labels.py $pr done 1.13; done

TheAifam5 and others added 6 commits April 26, 2023 21:00
[ upstream commit 5163094 ]

Previously, kernel modules names were read from the `/proc/modules` only which prevented k8s linux distribution with kernel modules builtin into the kernel to pass the validation of required kernel modules by Cilium.

This patch adds support for systems with builtin modules by trying to read the kernel module names from following files additionally:
- `/lib/modules/<kernel-version>/modules.builtin`
- `/usr/lib/modules/<kernel-version>/modules.builtin`
- `/usr/lib/debug/lib/modules/<kernel-version>/modules.builtin`

Fixes: cilium#23863

Signed-off-by: Mateusz Paluszkiewicz (TheAifam5) <theaifam5@gmail.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit e84e99f ]

The ginkgo tests are currently executed in a different order for each
run in our CI. Unfortunately, the order in which we run the tests often
matters and can impact the test results. That's because there are a lot
of side effects (leftover Linux state, leftover Cilium conntrack
entries, leftover Kubernetes state, etc.) that are hard to control for.

This undeterminism facilities the introduction of flakes: a first pull
request introduces a flake, but it doesn't show up until a later pull
request runs the tests in a particular order.

The random order also makes flake a bit order to reproduce locally,
though that can easily be worked around by passing the proper seed to
ginkgo when reproducing.

One argument in favor of this randomness is that it surfaces those side
effects and allows us to identify and clean them. In practice, that
doesn't work great:
- Being side effects, they are often hard to identify and are a huge
  time cost for contributors to debug. Our ever increasing list of flake
  issues is a good testimony of that.
- There are not always easy to clean up [1].
- There are often not representative of bugs likely to affect users. Our
  users rarely restart Cilium 50 times in the span of an hour with
  different configurations.
In the end, our tests were not meant to uncover those side effects and
they are therefore inadequate at exposing them properly. If we want to
identify those side effects, then we should have tests specifically for
that.

Other, non-ginkgo CI jobs run their tests in a fixed order [2] or on
completely independent clusters [3].

This commit therefore fixes the order of the ginkgo tests. To that end,
a ginkgo seed was selected using $SRANDOM and will be the same for all
subsequent runs.

1 - cilium#17459
2 - cilium/cilium-cli#558
3 - https://github.com/cilium/cilium/blob/main/.github/workflows/conformance-datapath.yaml
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit 922aa1b ]

When using metallb as BGP speaker, if IPv6 advertisement is made - metallb will return error as unsupported.
This error is logged and error is returned to control loop, which continues retrying causing log flooding and high CPU.
This change filters out IPv6 prefixes before sending them to metallb library and logs one time error message.

Signed-off-by: harsimran pabla <hpabla@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit e7c748f ]

The leader election logic of the operator is currently executed in a
separate goroutine started by an OnStart hook, and relies on a context
(canceled by the respective OnStop hook) for shutting down. Once this
occurs, it will automatically release the lease, to allow the immediate
election of a new leader. Yet, we do not explicitly wait for this
operation to complete, and the operator might stop before the lease is
actually released, introducing a ~15 seconds delay for the next election.

This commit fixes this by introducing a WaitGroup to explicitly wait for
the leader election loop to terminate before continuing with the
shutdown process.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit d2301cb ]

When running the host firewall over IPv6 NodePort traffic, it would
enforce policies on the SYN+ACK packet with the SNATed IP address.
Obviously that didn't work great.

This happened because the IPv6 logic would simply lookup the source
security identity in the ipcache using the source IP address, without
considering that it might be SNATed. In case it is SNATed, the source IP
is always a host IP and the identity always HOST_ID. Thus, the host
firewall starts incorrectly enforcing policies and the packet.

The IPv4 path, on the other hand, does this source identity lookup
right: it only assumes a packet is coming from the local host if it has
the HOST magic mark (which we set via netfilter). SNATed packets don't
have this mark and therefore end up with an UNKNOWN identity.

This commit fixes the IPv6 logic to match the IPv4 logic.

This bug has always existed in the IPv6 code path. For IPv4, it was
fixed in commit cfd8fbf ("bpf: Fix srcID resolution on to-netdev in
case of SNAT"). I don't remember why I seemed to think the IPv6 code
path didn't need the same fix :(

We already have an end-to-end test to cover this code path, but it is
currently quarantined. Other changes are required before we can
unquarantine it, so I'll send those in a separate patchset.

Fixes: 489dbef ("bpf: Enforce host policies for IPv6")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit 16e4b68 ]

Update documentation related to configuration on AWS. Certain
IAM Policies must be in place if --clustername and --eni-gc-tags
are not set.

Incomplete docs got introduced after e66ed7f

Signed-off-by: Tore S. Loenoey <tore.lonoy@gmail.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras sayboras requested a review from a team as a code owner April 26, 2023 11:17
@sayboras sayboras added kind/backports This PR provides functionality previously merged into master. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels Apr 26, 2023
@sayboras
Copy link
Member Author

/test-backport-1.13

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PRs look good. Thanks!

[ upstream commit 46de5bc ]

These are remnants of a past before GHPRB. At best they create
uncessary noise in the logs, at worst they can interfere with the
default behaviour, so let's just remove them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras
Copy link
Member Author

sayboras commented Apr 26, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.24-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathVerifier Runs the kernel verifier against Cilium's BPF datapath

Failure Output

FAIL: terminating containers are not deleted after timeout

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.24-kernel-4.19/22/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.17-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With native routing

Failure Output

FAIL: Failed to reach 192.168.56.11:80 from testserver-host-mr2zv

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.19/22/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.17-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.19-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.19/21/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good. Thanks!

operator/cmd/root.go Show resolved Hide resolved
@sayboras
Copy link
Member Author

/test-1.16-4.19

@sayboras
Copy link
Member Author

/test-1.17-4.19

@sayboras
Copy link
Member Author

/test-1.18-4.19

@sayboras
Copy link
Member Author

/test-1.19-4.19

@sayboras
Copy link
Member Author

/test-1.20-4.19

@sayboras
Copy link
Member Author

/test-1.21-4.19

@sayboras
Copy link
Member Author

/test-1.22-4.19

@sayboras
Copy link
Member Author

/test-1.23-4.19

@sayboras
Copy link
Member Author

/test-1.24-4.19

@sayboras
Copy link
Member Author

/test-runtime

@sayboras
Copy link
Member Author

/test-1.21-4.19

@sayboras
Copy link
Member Author

/test-1.17-4.19

@sayboras
Copy link
Member Author

/test-1.18-4.19

@sayboras
Copy link
Member Author

/test-1.19-4.19

@sayboras
Copy link
Member Author

/test-1.24-4.19

@sayboras
Copy link
Member Author

Re-trigger a couple of jobs due to a mixed of Github and provision issues.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 27, 2023
@michi-covalent
Copy link
Contributor

i trust mlh

@michi-covalent michi-covalent merged commit 3cfaa50 into cilium:v1.13 Apr 28, 2023
60 of 61 checks passed
@sayboras sayboras deleted the pr/v1.13-backport-2023-04-26 branch April 28, 2023 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants