-
Notifications
You must be signed in to change notification settings - Fork 3.2k
ipsec, auth: Avoid out-of-order events that lead to inconsistent node IDs #27029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec, auth: Avoid out-of-order events that lead to inconsistent node IDs #27029
Conversation
f1e103c
to
faeef2b
Compare
This comment was marked as resolved.
This comment was marked as resolved.
actually now that it is internal to datapath/node can we get rid of all the exported GetNodeIP, GetNodeID. But DumpNodeIDs is used from cmd so might still need that. And one small question. On restart can we miss a NodeDelete? If that happens do we need some logic to correct or should the normal nodeDelete logic eventually get called and nodeID map will also be fixed up (I think this is true?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing!
I agree with the overall assessment trying to save a map lookup in the datapath introduces too much complexity in user space to warrant the effort/complexity/risk and anyways this is going to an ipsec stack that is going to encrypt the entire thing anyways so a single map lookup is optimizing something with no/little gain. |
faeef2b
to
a388830
Compare
1516e78
to
654ad8c
Compare
The Mutual Auth feature is also still using
AFAIK, yes, the NodeDelete logic gets called on restart in that case. We should manually test it though. |
After removing the node IDS from the IPCache with cilium#27029, the auth module should directly depend on the NodeIDHandler instead of the IPCache. This commit refactors the dependency from the IPCache to the NodeIDHandler. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
After removing the node IDS from the IPCache with #27029, the auth module should directly depend on the NodeIDHandler instead of the IPCache. This commit refactors the dependency from the IPCache to the NodeIDHandler. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
[ upstream commit eb3c1d8 ] After removing the node IDS from the IPCache with cilium#27029, the auth module should directly depend on the NodeIDHandler instead of the IPCache. This commit refactors the dependency from the IPCache to the NodeIDHandler. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
We found out that the logic to allocate and deallocate node IDs had multiple bugs because it wasn't handling out-of-order events properly. In this context, out-of-order events are e.g. when pod objects are received before their corresponding node object. We discussed those issues at length in #26725 (see review) and concluded we should try to avoid having to deal with such out-of-order events altogether.
This pull request therefore removes all handling of node IDs on pod events. Node IDs are now entirely managed from node events. The node IDs are removed from the ipcache (
ep->node_id
). That simplifies the agent code a lot (all ipcache is uncoupled from node ID logic) and only slightly complicated the datapath logic (one more map lookup).Summary of commits:
ep->node_id
in the datapath and replace them with an additional map lookup.See commits for details.
This PR is an alternative to the solution tried at #26725.