-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustermesh: allow remote configuration to be changed quickly #24046
clustermesh: allow remote configuration to be changed quickly #24046
Conversation
e2b368c
to
5f07fd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oblazek for your PR.
I have a few comments inline. I also wonder how you tested this, since there is apparently an issue with the fs watcher, which causes updates to already existing configuration files not to be correctly detected. I'm working to fix that.
can you be more specific? I always replaced the file that fsnotify is watching with |
The issue I mentioned occurs only if the config folder is mounted in the container from a secret. Are you mounting the configuration files differently? I assume so, since you mentioned using |
So mostly I have tested this when running agent in a docker container with the clustermesh config folder mounted as a volume, as it's easier to change the config (e.g. the mv command). But in k8s we are mounting as a secret and had some issues there as well. |
FYI, #24163 addresses the above issue |
5f07fd9
to
e52ee27
Compare
e52ee27
to
deb9769
Compare
hey 👋 @giorio94 ptal.. looks better now imo :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oblazek!
The controller change LGTM. I left you a comment/question inline about the watcher termination handling.
Previously when etcd client context was not set and client created with client.New(*config) it would use context.TODO, without any way to cancel the ctx. In some cases (e.g. wrong endpoints configuration) it would be necessary to cancel the ctx, otherwise the client might be stucked in a retry loop and in case of remoteConnectionController there wouldn't be any way stop that. This commit passes a parent context to the client itself. This commit also fixes the closing of stopWatch channel and rest of the resources when ctx is done/cancelled. If ctx was closed and select statement choosed handling of that case (<-ctx.Done()) the code would quit the for loop and handling of <-w.stopWatch would never be possible which means calling w.Stop() from outside (e.g. releaseOldConnection()) would hang forever as the wg counter wouldn't be decremented. Now when the context gets closed, it correctly closes all channels and decrements the wg counter, just like when handled by releaseOldConnection func which releases everything. Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
This commit adds a new ControllerParam CancelDoFuncOnUpdate which allows cancelling the controller context passed to DoFunc. This way we can terminate the controller (if stuck in e.g. etcd client retry loop) and still preserve the existing logic. Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
deb9769
to
5f3c745
Compare
/test Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
/test-1.16-4.19 |
/test-1.16-4.19 |
/test-1.26-net-next |
thanks @giorio94 for retriggerring |
In case there was a typo in remote cluster configuration which caused etcd connection to fail it was not possible to change the remote etcd configuration without waiting for initialConnectionTimeout.
See commit messages for detailed desc.
Signed-off-by: Ondrej Blazek ondrej.blazek@firma.seznam.cz