Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsec: Fix IPsec reinitialization #25936

Merged

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Jun 6, 2023

Fixes issues in previous PR (#25744) on IPsec ENI BPF program loading:

loader: In IPsec reload ignore veth devices & fix settle wait
    
reloadIPSecOnLinkChanges() did not ignore veth device updates causing
reload to be triggered when new endpoints were created. Ignore any
updates with "veth" as device type.

The draining of updates during settle wait was broken due to unintentional
breaking out of the loop. Removed the break.
loader: Do not fatal on IPsec reinitialization

Now that the code is reloading the bpf_network program at runtime we
should not fatal if we fail to reload the program since this may be caused
by ongoing interface changes (e.g. interface was being removed). Change
the log.Fatal into log.Error and keep loading to other interfaces.

Validated these fixes in AWS ENI environment with IPsec enabled by creating and
destroying dummy devices:

$ for i in $(seq 1 100); do sleep 0.1; ip link add dummy$i type dummy; ip link del dummy$i; done
$ kubectl get logs ...
2023-06-06T13:55:03.474507212Z level=info msg="Reinitializing IPsec due to device changes" subsys=datapath-loader
2023-06-06T13:55:03.480640541Z level=info msg="Encryption network program (re)loaded" interface=ens3 subsys=datapath-loader
2023-06-06T13:55:03.480648286Z level=error msg="Load encryption network failed" error="getting interface dummy1 by name: Link not found" interface=dummy1 subsys=datapath-loader
2023-06-06T13:55:03.480656250Z level=warning msg="Failed to reinitialize IPsec after device change" error="failed to load encryption program: getting interface dummy1 by name: Link not found" subsys=datapath-loader
2023-06-06T13:55:04.479609463Z level=info msg="Reinitializing IPsec due to device changes" subsys=datapath-loader
2023-06-06T13:55:04.482395149Z level=info msg="Encryption network program (re)loaded" interface=ens3 subsys=datapath-loader
2023-06-06T13:55:05.482543443Z level=info msg="Reinitializing IPsec due to device changes" subsys=datapath-loader

The time between reinitializations is now at least 1 second and we do not crash when loading fails unexpectedly.

Fix three issues in the bug fix to attach IPsec BPF programs to ENI interfaces: do not fatal if loading unexpectedly fails (which may happen if the device is suddenly deleted), ignore veth device changes in order not to reinitialize when new endpoints appear and wait 1 second for further device state changes between reinitializations.

Now that the code is reloading the bpf_network program at runtime we
should not fatal if we fail to reload the program since this may be caused
by ongoing interface changes (e.g. interface was being removed). Change
the log.Fatal into log.Error and keep loading to other interfaces.

Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
reloadIPSecOnLinkChanges() did not ignore veth device updates causing
reload to be triggered when new endpoints were created. Ignore any
updates with "veth" as device type.

The draining of updates during settle wait was broken due to unintentional
breaking out of the loop. Removed the break.

Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 6, 2023
@joamaki joamaki added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.11 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.11 Jun 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.11.18 Jun 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.4 Jun 6, 2023
@joamaki
Copy link
Contributor Author

joamaki commented Jun 6, 2023

/test

@joamaki joamaki requested a review from pchaigno June 6, 2023 14:00
@joamaki joamaki marked this pull request as ready for review June 6, 2023 14:00
@joamaki joamaki requested a review from a team as a code owner June 6, 2023 14:00
@joamaki joamaki requested a review from rgo3 June 6, 2023 14:00
joamaki added a commit that referenced this pull request Jun 6, 2023
[ upstream PR #25936 ]

Now that the code is reloading the bpf_network program at runtime we
should not fatal if we fail to reload the program since this may be caused
by ongoing interface changes (e.g. interface was being removed). Change
the log.Fatal into log.Error and keep loading to other interfaces.

Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
joamaki added a commit that referenced this pull request Jun 6, 2023
[ upstream PR #25936 ]

reloadIPSecOnLinkChanges() did not ignore veth device updates causing
reload to be triggered when new endpoints were created. Ignore any
updates with "veth" as device type.

The draining of updates during settle wait was broken due to unintentional
breaking out of the loop. Removed the break.

Fixes: bf0940b ("loader: Reinitialize IPsec on device changes on ENI")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left non-blocking comments, as they don't address anything functional.

pkg/datapath/loader/loader.go Show resolved Hide resolved
pkg/datapath/loader/netlink.go Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 7, 2023
@dylandreimerink dylandreimerink merged commit 592777d into cilium:main Jun 7, 2023
62 checks passed
@pchaigno
Copy link
Member

pchaigno commented Jun 7, 2023

@dylandreimerink Please don't trust MLH blindly. I had yet to re-review here.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@YutaroHayakawa YutaroHayakawa mentioned this pull request Jun 7, 2023
4 tasks
@YutaroHayakawa YutaroHayakawa added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.4 Jun 7, 2023
@YutaroHayakawa YutaroHayakawa mentioned this pull request Jun 8, 2023
3 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.11 Jun 8, 2023
@YutaroHayakawa YutaroHayakawa mentioned this pull request Jun 8, 2023
2 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 8, 2023
@YutaroHayakawa YutaroHayakawa added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jun 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.4 Jun 8, 2023
@YutaroHayakawa YutaroHayakawa added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.11 Jun 12, 2023
@qmonnet qmonnet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 14, 2023
@qmonnet qmonnet moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.18 Jun 14, 2023
@pchaigno pchaigno added the area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. label Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.11.18
Backport done to v1.11
1.12.11
Backport done to v1.12
1.13.4
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

6 participants