Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

Open
jonathan-innis opened this issue Mar 22, 2024 · 6 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question.

Comments

@jonathan-innis
Copy link
Member

jonathan-innis commented Mar 22, 2024

During controller shutdown, we consistently see errors that look like

logger.go:146: 2024-03-22T21:35:31.707Z	INFO	cache/reflector.go:462	pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

This occurs extremely consistently during shutdown and I wouldn't expect that we would see something that looks like an error that comes through an INFO/WARN path. From looking at the reflector code, it seems like this "error" is coming from this line. Is there a way to ensure that the runnable shutdown doesn't fire this error every time that we shutdown?

As an example, these "errors" are coming in our Karpenter E2E testing here: https://github.com/aws/karpenter-provider-aws/actions/runs/8396432349/job/22997824261

Screenshot 2024-03-22 at 10 52 50 PM
@troy0820
Copy link
Member

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Mar 22, 2024
@laihezhao
Copy link

@jonathan-innis I meet the same problem,
image
will it causes memory leaks??

@jonathan-innis
Copy link
Member Author

If the controller is shutting down, I don't think it's going to cause memory leaks. From looking through the code, it just looks like spurious error logging from the reflector as all the context cancels are happening, but I'm imagining there's a more graceful way to shut the reflector down so we don't see this.

@jonathan-innis
Copy link
Member Author

@laihezhao Your error also looks quite different from mine. Yours appears to be caused by 500s occurring somewhere on the apiserver.

@jonathan-innis
Copy link
Member Author

@troy0820 Got any thoughts here on how this can be improved? Ideally, we wouldn't be seeing errors for what appears to be a graceful shutdown for controller-runtime.

@troy0820
Copy link
Member

troy0820 commented Apr 8, 2024

@jonathan-innis I am going to investigate this but this looks like it can be a bug, so I will label the issue with it so we can triage it a little better.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

4 participants