-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Fix XfrmOutPolBlock
drops on upgrades
#25735
Conversation
ff21f54
to
1db1553
Compare
XfrmOutPolBlock
drops on ESK & AKS upgradesXfrmOutPolBlock
drops upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to make xfrm states coexist is marvelous!
Just a newbee question: we coexist xfrm states, but update xfrm policies directly, and upgrade test still passes, so where do we use xfrm policy?
1db1553
to
df366bb
Compare
Actually, we don't update the XFRM policies directly. In the second commit, we deprioritize the old XFRM policies such that they can coexist with the new ones added as part of the upgrade. I've updated the commit descriptions to include example of how the XFRM configs will change. If a packet comes into the XFRM subsystem, it will first be matched against the new XFRM policies with marks of the form 0xXXXXYe00/0xffffff00. If that doesn't match because the packet is still using the old mark format (because bpf_lxc hasn't been reloaded yet), then it will match against the deprioritized XFRM policies with marks of the form 0xYe00/0xff00. Once we've matched against a policy, it will search for the matching XFRM state using the IP addresses from the policy's template. For example, with the below policies, it will look for an XFRM state with
|
XfrmOutPolBlock
drops upgradesXfrmOutPolBlock
drops on upgrades
Question: are the prioritized policies just left like this forever - could we not clean them up after some time? I guess we'd have to know that all nodes have had their host endpoints reloaded before doing that... (I'm wondering if leaving old policies around might provide other places for issues during future version upgrades |
This pull request fixes IPsec drops of type
XfrmOutPolBlock
that would happen with all IPAM modes during upgrades. The root of the issue is that we would delete the old XFRM OUT policies and/or states instead of leaving them around for the duration of the upgrade.This pull request makes various changes to ensure that we smoothly transition from the old IPsec implementation to the new. After this pull request, we shouldn't have any packet drops on upgrades when using ENI or Azure IPAM modes. Otherwise (for e.g. cluster-pool IPAM mode), it is theoretically possible to have a few drops of type
XfrmOutNoStates
; those however haven't been noticed in any of the upgrade tests. There is more information on thoseXfrmOutNoStates
drops in the last commit of this pull request.How was this tested?
The usual upgrade path from v1.13.0 to this patchset was tested on two clusters of 20 nodes each, on EKS and GKE. Before the upgrade, our drop-sensitive migrate-svc application was deployed with 50 clients and 30 backends. That application sends a continuous stream of messages back and forth and fails whenever a message is dropped.
After the upgrade to this patchset, I verified that:
Fixes: #24773.
Fixes: #24030.