Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 backports 2023-03-28 #24607

Merged
merged 16 commits into from
Mar 31, 2023

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented Mar 28, 2023

Once this PR is merged, you can update the PR labels via:

$ for pr in 22980 23281 23351 24469 24339 24546 23631 24577 24569; do contrib/backporting/set-labels.py $pr done 1.13; done

Skipped:

@tklauser tklauser requested a review from a team as a code owner March 28, 2023 13:04
@tklauser tklauser added kind/backports This PR provides functionality previously merged into master. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels Mar 28, 2023
@tklauser tklauser force-pushed the pr/v1.13-backport-2023-03-28 branch from aad7751 to 8ee190f Compare March 28, 2023 13:31
@tklauser tklauser force-pushed the pr/v1.13-backport-2023-03-28 branch from 8ee190f to 7c8b6d9 Compare March 28, 2023 14:11
@tklauser
Copy link
Member Author

/test-backport-1.13

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my changes, thanks a lot 🎖️

@geakstr
Copy link
Contributor

geakstr commented Mar 29, 2023

Thanks Tobias!

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tklauser
Copy link
Member Author

Blocked on #24611

@tklauser
Copy link
Member Author

tklauser commented Mar 29, 2023

Looks like the 4.9 Jenkins jobs failed with cilium pods crash looping and the following message in the logs:

2023-03-28T18:18:37.228913131Z level=error msg=\"Start hook failed\" error=\"daemon creation failed: error while initializing daemon: failed to ensure local IP rules: could not replace IPv4 local rule: file exists\" function=\"cmd.newDaemonPromise.func1 (daemon_main.go:1677)\" subsys=hive

See e.g. https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/2580/

Presumably #24288 is the culprit here. Not sure what to do about this. Is there anything different in the 4.9 VM images than in the other ones wrt. ip rules? @NikAleksandrov could you have a look?

@NikAleksandrov
Copy link

Looks like the 4.9 Jenkins jobs failed with cilium pods crash looping and the following message in the logs:

2023-03-28T18:18:37.228913131Z level=error msg=\"Start hook failed\" error=\"daemon creation failed: error while initializing daemon: failed to ensure local IP rules: could not replace IPv4 local rule: file exists\" function=\"cmd.newDaemonPromise.func1 (daemon_main.go:1677)\" subsys=hive

See e.g. https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/2580/

Presumably #24288 is the culprit here. Not sure what to do about this. Is there anything different in the 4.9 VM images than in the other ones wrt. ip rules? @NikAleksandrov could you have a look?

@tklauser will do and see how we can fix it

@tklauser tklauser force-pushed the pr/v1.13-backport-2023-03-28 branch from 7c8b6d9 to bb06427 Compare March 29, 2023 15:45
@tklauser
Copy link
Member Author

Looks like the 4.9 Jenkins jobs failed with cilium pods crash looping and the following message in the logs:

2023-03-28T18:18:37.228913131Z level=error msg=\"Start hook failed\" error=\"daemon creation failed: error while initializing daemon: failed to ensure local IP rules: could not replace IPv4 local rule: file exists\" function=\"cmd.newDaemonPromise.func1 (daemon_main.go:1677)\" subsys=hive

See e.g. https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/2580/
Presumably #24288 is the culprit here. Not sure what to do about this. Is there anything different in the 4.9 VM images than in the other ones wrt. ip rules? @NikAleksandrov could you have a look?

@tklauser will do and see how we can fix it

As discussed on Slack, I've added an additional commit bb06427 ignoring EEXIST on linuxdatapath.NodeEnsureLocalIPRule in order to fix CI on 4.9 (that kernel doesn't support fib rule protocol attribute).

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good, thanks. 👍

@jibi
Copy link
Member

jibi commented Mar 30, 2023

I don't think #23351 should be backported to v1.13, as the commit that it's fixing is not in v1.13 as well (why was it marked for backport?)

IIRC, I picked it up because I had to pick up #23281 for 0e3b0bb (the commit mentioned in #23351 PR description) to fix another issue in a unit test introduced by #24469. Picking #23281 lead to a failure in the egressgw tests which I figured were fixed by #23351.

I see, in that case all good 👍

@tklauser
Copy link
Member Author

tklauser commented Mar 30, 2023

It looks like there's another issue with #24288 on kernel 4.9 even with #24645 in place: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.24-kernel-4.9/100/testReport/junit/Suite-k8s-1/24/K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing_with_bpf_host/

This seems to be failing in (*linuxNodeHandler).replaceHostRules(). I think I'm going to drop #24288, #24577 and #24645 from this backport PR in order to get it into a mergeable state by the end of the week.

@NikAleksandrov could you please look into what else is needed to successfully backport #24288?

dylandreimerink and others added 16 commits March 30, 2023 17:49
[ upstream commit b3a2081 ]

The BPF unit/integration tests had some trouble running on older
kernels (<=5.8). That is because the test runner always attempted to
load all programs in the ELF. This fix makes it so we only attempt to
load XDP and TC programs, both of which have `BPF_PROG_RUN` support.

This commit also removes the rlimit memory lock which is required on
pre-5.11 kernels.

Fixes: cilium#22779
Fixes: cilium#22780
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 96ab7dc ]

This allows for targeted execution during development and provides a nice
overview of all files and subtests executed.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 0e3b0bb ]

MAC addresses used by the test suite were declared as generic global variables,
causing them to be emitted to .data by default. This is confusing to Cilium's
inlineGlobalData(), which tries inlining map reads from .data into the bytecode
as immediate values. Cilium's datapath code anticipates this by using the
__fetch macro, but the test suite doesn't take the same approach.

This commit moves all MACs to .rodata by declaring them volatile const, stopping
inlineGlobalData from modifying the bytecode.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit b35560e ]

This patch de-duplicates the logic around detecting ProgramType and hooking
up tail calls into prog arrays. The Cilium agent itself has been on
bpf.LoadCollection() for the majority of its prog loads for a few months now.

Print full verifier logs when a VerifierError is encountered.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3764f26 ]

To make it work with 0e3b0bb ("bpf/tests: move mac_one through
mac_six to .rodata")

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 8a86677 ]

The unit test checks whether a policy drop increments the corresponding
BPF metrics counter.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit a303208 ]

The test was replaced by the BPF unit test from the previous commit.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3d0c506 ]

This commit extends the Helm chart, to respect the extra DNS names and
IP addresses specified for the clustermesh API server TLS certificate,
when it is generated through the Cilium's certgen tool. The same fields
are already considered when helm or cert-manager are used for generating
that certificate.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2b8de11 ]

No functional changes, but this should make it easier to review the next
commit.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 66fd935 ]

This commit extracts most of the exportPodCIDRReconciler logic into a
more generic function, exportAdvertisementsReconciler, so that it can be
reused by other subsystems which only need to advertise a set of IPNets.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit e29c8ea ]

Support the case when ingress is configured to serve hubble-ui
from non default `/` root url (ex. `/service-map`).

Related hubble-ui pull request:
cilium/hubble-ui#432

Signed-off-by: Dmitry Kharitonov<dmitry@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit e980ca0 ]

Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 8b20703 ]

When egress gateway is enabled, we define the ENABLE_EGRESS_GATEWAY
macro in the datapath, which in lib/common.h causes HAVE_ENCAP to be
defined as well:

    #if defined(ENCAP_IFINDEX) || defined(ENABLE_EGRESS_GATEWAY)
    #define HAVE_ENCAP
    //..
    #endif

This doesn't work with sockops enabled, as bpf_sockops.c ends up
including lib/overloadable_skb.h with HAVE_ENCAP defined and
ENCAP_IFINDEX not defined.

This happens since in the daemon we currently:

- create the node_config.h header file with createNodeConfigHeaderfile()
- invoke the logic to compile the sockops programs with the
  sockops.Sockmap*() methods
- initialize the datapath with Datapath().Loader().Reinitialize(), which
  is responsible among other things of invoking bpf/init.sh, which in
  turn is responsible for populating node_config.h with a few missing
  constants, one of which is ENCAP_IFINDEX

To fix this, simply invoke Datapath().Loader().Reinitialize() before
compiling the sockops programs, in order to properly populate
node_config.h

Fixes: cilium#24090
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 788bf37 ]

By right there should be a rule to let ipsec skb bypass conntrack:

-A CILIUM_PRE_raw -m mark --mark 0xd00/0xf00 -m comment --comment "cilium-xfrm-notrack:" -j CT --notrack

However ipv6 missed it and this commit adds the rule back.

Fixes: cilium#23481

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit dd0a8f5 ]

This reverts commit b1b9fbd and
re-enables test for IPv6 externalTrafficPolicy=Local E/W for IPsec.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit eeaee24 ]

It's better to run official release in our tests, either v1.26.0 or the
latest release of v1.26.x.

Relates: 70252f4
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser force-pushed the pr/v1.13-backport-2023-03-28 branch from 1fbf19e to 7a2012f Compare March 30, 2023 15:50
@tklauser
Copy link
Member Author

tklauser commented Mar 30, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.17-kernel-4.9' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-dnncq

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.17-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.21-kernel-4.9' hit: #22578 (95.32% similarity)

Job 'Cilium-PR-K8s-1.19-kernel-4.9' hit: #22578 (95.87% similarity)

@NikAleksandrov
Copy link

@tklauser thank you for giving it a try, I'll take care of the backport

@tklauser
Copy link
Member Author

/test-1.17-4.9

@tklauser
Copy link
Member Author

/test-1.19-4.9

@tklauser
Copy link
Member Author

/test-1.21-4.9

@tklauser tklauser added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 30, 2023
@julianwiedmann julianwiedmann merged commit dd36c50 into cilium:v1.13 Mar 31, 2023
37 checks passed
@tklauser tklauser deleted the pr/v1.13-backport-2023-03-28 branch March 31, 2023 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet