-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.13 backports 2023-06-09 #26079
Merged
michi-covalent
merged 7 commits into
cilium:v1.13
from
pchaigno:pr/v1.13-backport-2023-06-09
Jun 9, 2023
Merged
v1.13 backports 2023-06-09 #26079
michi-covalent
merged 7 commits into
cilium:v1.13
from
pchaigno:pr/v1.13-backport-2023-06-09
Jun 9, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
maintainer-s-little-helper
bot
added
backport/1.13
This PR represents a backport for Cilium 1.13.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
labels
Jun 9, 2023
/test-backport-1.13 |
CreateOrUpdateRedirect called nil revertFunc when any local error was returned. This was done using the pattern `return 0, err, nil, nil` which sets the revertFunc return variable as nil, but this was called on a deferred function to revert any changes on a local error. Fix this by calling ReverStack.Revert() directly on the deferred function, and setting the return variable if there was no local error. This was hit any time a CiliumNetworkPolicy referred to a non-existing listener. Add a test case that reproduced the panic and works after the fix. [ Quentin: replace envoy.GetSocketDir() from original commit, only introduced in commit c578f20 ("envoy proxy: separate directory for envoy proxy sockets"). Import missing "os". ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> fixup! proxy: Do not panic on local error
Only update an existing redirect if it is configured. This prevents Cilium agent panic when trying to update redirect with released proxy port. This has only been observed to happen with explicit Envoy listener redirects in CiliumNetworkPolicy when the listener has been removed. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Increment non-DNS proxy ports on failure even if DNS has been configured with a static port. Fixes: cilium#20896 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
We recently changed our XFRM configuration to have one XFRM OUT policy per remote node, regardless of the IPAM mode being used. In doing so, we also moved the XFRM FWD policy to be installed once per remote node. With ENI and Azure IPAM modes, this wouldn't cause any issue because the XFRM FWD policy is the same regardless of the remote node. On other IPAM modes, however, the XFRM FWD policy is for some reason different depending on the remote node that triggered the installation. As a result, for those IPAM modes, one FWD policy is installed per remote node. And the deletion logic triggered on node deletions wasn't updated to take that into account. We thus have a leak of XFRM FWD policies. In the end, our FWD policy just needs to allow everything through without encrypting it. It doesn't need to be specific to any remote node. We can simply completely wildcard the match, to look like: src 0.0.0.0/0 dst 0.0.0.0/0 dir fwd priority 2975 ptype main tmpl src 0.0.0.0 dst 192.168.134.181 proto esp reqid 1 mode tunnel level use So we match all packets regardless of source and destination IPs. We don't match on the packet mark. There's a small implementation hurdle here. Because we used to install FWD policies of the form "src 0.0.0.0/0 dst 10.0.1.0/24", the kernel was able to deduce which IP family we are matching against and would adapt the 0.0.0.0/0 source CIDR to ::/0 as needed. Now that we are matching on 0/0 for both CIDRs, it cannot deduce this anymore. So instead, we must detect the IP family ourself and use the proper CIDRs. In addition to changing the XFRM FWD policy to the above, we can also stop installing it once per remote node. It's enough to install it when we receive the event for the local node, once. Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
We use this wildcard IPv6 CIDR in the catch-all default-drop OUT policy as well as in the FWD policy. It was incorrectly set to ::/128 instead of ::/0 and would therefore not match anything instead of matching everything. This commit fixes it. Fixes: e802c29 ("ipsec: Refactor wildcard IP variables") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were getting occasional spikes of "no inbound state" xfrm errors (XfrmInNoStates). These lead to packet loss and brief outages for applications sending traffic to the node on which the spikes occur. I noticed that the "No node ID found for node." logline would appear at the time of these spikes and from the code this is logged when the node ID cannot be resolved. Looking a bit further the call to `DeleteIPsecEndpoint` will end up deleting the xfrm state for any state that matches the node id as derived from the mark in the state. The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP has a mark of `0xd00` which when shifted >> 16 in `getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets deleted and the kernel drops all the inbound traffic as it no longer matches a state. This commit updates that logic to skip the XFRM state and policy deletion when the node ID is zero. Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Steven Johnson <sjdot@protonmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") we rely on the node ID to find XFRM states and policies that belong to remote nodes, to clean them up when remote nodes are deleted. This commit makes sure that we only do this for XFRM states and policies that actually match on these node IDs. That should only be the same if the mark mask matches on node ID bits. Thus if should look like 0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00 (matches on node and encryption bit). Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
qmonnet
force-pushed
the
pr/v1.13-backport-2023-06-09
branch
from
June 9, 2023 15:27
e7b6663
to
6360945
Compare
Go linter would complain on missing diff --git a/pkg/proxy/proxy_test.go b/pkg/proxy/proxy_test.go
index f4fc75a05832..e2a2265dfd2a 100644
--- a/pkg/proxy/proxy_test.go
+++ b/pkg/proxy/proxy_test.go
@@ -5,10 +5,11 @@ package proxy
import (
"context"
+ "os"
+ "path/filepath"
"testing"
"github.com/cilium/cilium/pkg/completion"
- "github.com/cilium/cilium/pkg/envoy"
"github.com/cilium/cilium/pkg/identity"
"github.com/cilium/cilium/pkg/policy"
"github.com/cilium/cilium/pkg/proxy/logger"
@@ -199,7 +200,7 @@ func (s *ProxySuite) TestCreateOrUpdateRedirectMissingListener(c *C) {
mockDatapathUpdater := &MockDatapathUpdater{}
testRunDir := c.MkDir()
- socketDir := envoy.GetSocketDir(testRunDir)
+ socketDir := filepath.Join(testRunDir, "envoy", "sockets")
err := os.MkdirAll(socketDir, 0700)
c.Assert(err, IsNil)
|
/test-backport-1.13 |
michi-covalent
approved these changes
Jun 9, 2023
I tested this on a EKS cluster and performed the usual scale up and down to check that the leak is fixed. I also checked that the upgrade doesn't introduce any new drops and that connectivity tests pass after it. |
This was referenced Jun 9, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.13
This PR represents a backport for Cilium 1.13.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Skipped to focus on the above release blocker:
Once this PR is merged, you can update the PR labels via: