Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sandbox: do retry for wait to remote sandbox controller #10201

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abel-von
Copy link
Contributor

For remote sandbox controllers, the controller process may restart, we have to retry if the error indicates that it is the grpc disconnection.

remote sandbox controller may restart, the Wait call should be retried
if it is an grpc disconnetion error.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
@k8s-ci-robot
Copy link

Hi @abel-von. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Burning1020
Copy link
Member

/ok-to-test

@kzys
Copy link
Member

kzys commented May 15, 2024

/retest

@abel-von
Copy link
Contributor Author

abel-von commented May 17, 2024

/cc @mxpv @mikebrow @dmcgowan @fuweid

@abel-von
Copy link
Contributor Author

/cc @fuweid

@k8s-ci-robot k8s-ci-robot requested a review from fuweid May 17, 2024 01:38
@cpuguy83
Copy link
Member

I like this, and may argue that it wouldn't hurt to do this in all cases, not just remote sandbox controller.

@cpuguy83
Copy link
Member

For reference https://github.com/cpuguy83/containerd-shim-systemd-v1 which does not really make sense as a sandbox controller but would likely be worth it to implement even just for this change.

retryInterval time.Duration = 128
)
for {
resp, err = s.client.Wait(ctx, &api.ControllerWaitRequest{SandboxID: sandboxID})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add failpoint testcase to check if it's infinite loop for ttrpc shim? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we need to add a mocked remote sandbox controller and then we can add failpoint testcast for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants