Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: Handle correctly state when CEP is present in multiple CESs #24838

Merged
merged 1 commit into from
Apr 25, 2023

Conversation

alan-kut
Copy link
Contributor

@alan-kut alan-kut commented Apr 12, 2023

There are condition possible in which CEP changes CES. This leads to CEP being present in multiple CESs for some time. In such cases the standard logic may not work as it always expect to have a single CEP representation.

This commit changes to logic to handle multiple CEPs properly.

Fix incorrect network policy ebpf setup that may lead to incorrect packets denies when CEP is present in multiple CES

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 12, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Apr 12, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent: Fix a when CEP is present in multiple CESs, e.g. when it's transferred between CESs.

What's the user-visible impact of this bug?

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Apr 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 12, 2023
@alan-kut
Copy link
Contributor Author

agent: Fix a when CEP is present in multiple CESs, e.g. when it's transferred between CESs.

What's the user-visible impact of this bug?

CEP missing from the ipcache.

Current state:
There is a CEP being moved from CES1 to CES2 and agent received the following events in the following order:

  • Add CES2 with CEP -> endpointUpdated(nil, CEP from CES2)
  • Add CES1 with CEP -> endpointUpdated(nil, CEP from CES1)
  • Update CES1 remove CEP -> endpointDeleted(CEP from CES1)

The end state is that CEP is missing from the ipcache.
In the middle CEP is not updated properly as all the calls assume there is just one CEP and it's not updated.

pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
@alan-kut alan-kut marked this pull request as ready for review April 13, 2023 06:57
@alan-kut alan-kut requested a review from a team as a code owner April 13, 2023 06:57
Copy link
Contributor

@skmatti skmatti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pkg/k8s/watchers/cilium_endpoint_slice_subscriber.go Outdated Show resolved Hide resolved
@tommyp1ckles
Copy link
Contributor

@alan-kut Thanks for the contribution! Is there any issues open for this bug?

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release note in the PR description needs to describe the user-visible impact (e.g., pods can't start, packets are dropped due to policy denies, etc.) such that it is useful to users when displayed in our release notes.

@alan-kut
Copy link
Contributor Author

I updated the code with the tommyp1ckles suggestions.

I updated the release notes - the user visible impact is that some traffic may be incorrectly denied or allowed because eBPF maps will not be set properly because ipcache inside agent is misconfigured on some cases when CEP changes CES.

I'm not aware of any issue created for this problem.

@tommyp1ckles
Copy link
Contributor

@alan-kut The changes look good to me. I am curious, what are the circumstances that a CEP can exist in two CES?

@alan-kut
Copy link
Contributor Author

One example is when CEP is moved to a different CES (especially easy to trigger with identity batching when identity changes): add CEP to 2nd CES and then remove CEP from the 1st CES.

@alan-kut
Copy link
Contributor Author

So it's not theoretical bug - I can reproduce it - but in general - it's valid, temporarily state that CEP exists in multiple CESs. Even if we remove all current occurrences it may happen in the future, as intended or due to bug.

I think long term we need proper reconciliation logic but for now this should work.

Are we good to merge?

There are condition possible in which CEP changes CES.
This leads to CEP being present in multiple CESs for some time.
In such cases the standard logic may not work as it always expect
to have a single CEP representation.

This commit changes to logic to handle multiple CEPs properly.

Signed-off-by: Alan Kutniewski <kutniewski@google.com>
@pchaigno pchaigno dismissed their stale review April 20, 2023 17:42

My concern was addressed. I'll let others more familiar with that logic review the changes.

@pchaigno pchaigno removed their request for review April 20, 2023 17:42
@joestringer
Copy link
Member

We had some discussion during the community meeting today (notes), for now we will apply the regular policy for this PR. We'll follow up on clarifying the backporting policy for beta features going forward.

@joestringer
Copy link
Member

Marking for backport to v1.12, v1.13 unless anyone has concerns about the risk for backporting these changes to older branches.

@joestringer joestringer added needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 26, 2023
@sayboras sayboras mentioned this pull request Apr 28, 2023
3 tasks
@sayboras sayboras added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 28, 2023
@sayboras sayboras mentioned this pull request Apr 28, 2023
1 task
@ferozsalam
Copy link
Contributor

Hi @alan-kut – did you assess the applicability of this issue against v1.11 in addition to v1.12? We would want to backport it to that version too unless you have reason to believe it doesn't apply.

@alan-kut
Copy link
Contributor Author

alan-kut commented May 9, 2023

Hi @ferozsalam - it applies and the backport is not complex. I backported it manually in my internal repo.

@ferozsalam ferozsalam added needs-backport/1.11 affects/v1.11 This issue affects v1.11 branch labels May 9, 2023
@julianwiedmann
Copy link
Member

Backports for v1.12 and v1.13 are done.

@julianwiedmann julianwiedmann added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels May 9, 2023
@jrajahalme jrajahalme mentioned this pull request May 11, 2023
5 tasks
@jrajahalme jrajahalme added backport-pending/1.11 backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed needs-backport/1.11 labels May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet