Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: rework clustermesh config watcher for increased robustness #24163

Merged
merged 1 commit into from
Mar 13, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Mar 3, 2023

The agent leverages an fsnotify watcher to detect file changes in the directory containing the etcd configurations for remote clusters (as well as the associated keys/certs), triggering a reconfiguration as appropriate.

This PR reworks the above logic to address two main issues:

  • Due to how Kubernetes mounts ConfigMaps and Secrets within pods, and in particular the usage of symbolic links to handle atomic updates, changes in existing files were not detected (since the watched file itself was a symbolic link, which doesn't change during such operation). Now, an explicit watch operation is started for each configuration file (i.e., watching the actual target), causing a notification to be properly emitted when that is changed.
  • Prevent possible spurious notifications even if the configuration is not actually modified. This is achieved through hash comparison.

Integration tests are updated to cover the above two aspects.

Fixes: #23273

agent: rework clustermesh config watcher for increased robustness

@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels Mar 3, 2023
@giorio94 giorio94 requested a review from a team as a code owner March 3, 2023 17:33
@giorio94
Copy link
Member Author

giorio94 commented Mar 3, 2023

/test

@oblazek
Copy link
Contributor

oblazek commented Mar 6, 2023

verified from our side that this fixes the issue #23273 👍

@giorio94 giorio94 marked this pull request as draft March 9, 2023 08:06
@giorio94
Copy link
Member Author

giorio94 commented Mar 9, 2023

Last push fixed an issue which caused certain update notifications to be missed, and updated tests to check also for this case.

@giorio94 giorio94 marked this pull request as ready for review March 9, 2023 09:16
The agent leverages an fsnotify watcher to detect file changes in
the directory containing the etcd configurations for remote clusters
(as well as the associated keys/certs), triggering a reconfiguration
as appropriate.

This commit reworks the above logic to address two main issues:
* Due to how Kubernetes mounts ConfigMaps and Secrets within pods,
  and in particular the usage of symbolic links to handle atomic
  updates, changes in existing files were not detected (since the
  watched file itself was a symbolic link, which doesn't change
  during such operation). Now, an explicit watch operation is started
  for each configuration file (i.e., watching the actual target),
  causing a notification to be properly emitted when that is changed.
* Prevent possible spurious notifications even if the configuration
  is not actually modified. This is achieved through hash comparison.

Integration tests are updated to cover the above two aspects.

Fixes: cilium#23273
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

giorio94 commented Mar 9, 2023

Rebased to retrigger the tests, as images signing previously failed.

@giorio94
Copy link
Member Author

giorio94 commented Mar 9, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-cgbmk

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

@giorio94
Copy link
Member Author

giorio94 commented Mar 10, 2023

/test-1.26-net-next
Hit unrelated flake: #15455

@giorio94
Copy link
Member Author

giorio94 commented Mar 10, 2023

/test-runtime
Hit unrelated flake: #22373

@giorio94
Copy link
Member Author

giorio94 commented Mar 10, 2023

/test-runtime
Hit unrelated flake: #23997

@jrajahalme
Copy link
Member

@giorio94 Can you add the upstream commit reference to the commit message as documented in https://github.com/cilium/cilium/blob/master/Documentation/contributing/release/backports.rst?

@giorio94 giorio94 added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch and removed backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels Mar 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.13.1 Mar 10, 2023
@giorio94
Copy link
Member Author

@giorio94 Can you add the upstream commit reference to the commit message as documented in https://github.com/cilium/cilium/blob/master/Documentation/contributing/release/backports.rst?

@jrajahalme Sorry, I added the wrong label. It was intended to be needs-backport/1.13. Should be fixed now.

@giorio94
Copy link
Member Author

giorio94 commented Mar 13, 2023

/test-runtime
Hit unrelated flake: #23185

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 13, 2023
@borkmann borkmann merged commit 2cdd4ee into cilium:master Mar 13, 2023
@giorio94 giorio94 deleted the mio/clustermesh-config branch March 13, 2023 10:21
@nebril nebril added this to Needs backport from master in 1.13.2 Mar 15, 2023
@nebril nebril removed this from Needs backport from master in 1.13.1 Mar 15, 2023
@NikAleksandrov NikAleksandrov mentioned this pull request Mar 23, 2023
29 tasks
@NikAleksandrov NikAleksandrov added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 23, 2023
@jibi jibi added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Apr 7, 2023
@gentoo-root gentoo-root moved this from Needs backport from master to Backport done to v1.13 in 1.13.2 Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.13.2
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

Clustermesh reloaded every 60 seconds when deployed from k8s secret
7 participants