-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sandbox: do retry for wait to remote sandbox controller #10201
base: main
Are you sure you want to change the base?
Conversation
remote sandbox controller may restart, the Wait call should be retried if it is an grpc disconnetion error. Signed-off-by: Abel Feng <fshb1988@gmail.com>
Hi @abel-von. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/retest |
/cc @fuweid |
I like this, and may argue that it wouldn't hurt to do this in all cases, not just remote sandbox controller. |
For reference https://github.com/cpuguy83/containerd-shim-systemd-v1 which does not really make sense as a sandbox controller but would likely be worth it to implement even just for this change. |
retryInterval time.Duration = 128 | ||
) | ||
for { | ||
resp, err = s.client.Wait(ctx, &api.ControllerWaitRequest{SandboxID: sandboxID}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add failpoint testcase to check if it's infinite loop for ttrpc shim? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we need to add a mocked remote sandbox controller and then we can add failpoint testcast for this.
For remote sandbox controllers, the controller process may restart, we have to retry if the error indicates that it is the grpc disconnection.