New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token refresh miss causes rpc - error getting ClusterInformation connection is unauthorized: Unauthorized #8777
Comments
@abasitt thanks for raising. I think the interesting thing to dig into here would be why the periodic update seems to just stop. On the nodes encountering this issue, do you see the following log immediately after the "Update of CNI kubeconfig" log?
This would indicate that it is at least writing the file, and not getting stuck. Are there any other logs being emitted from |
@caseydavenport thank you for looking in to the issue, it seems to be getting stuck. See below. Let me know if you need any more details.
slight correction about k8s version, its v1.26.0 |
@abasitt if possible, a complete log as opposed to a filtered one might be helpful here to see what else is going on on that node. |
@caseydavenport sorry for late response. The pod issue was resolved and will share the full logs once I find another pod in this state. |
We are facing below issue..... It happens very randomly, let's say one node in hundreds and the restart of the calico pod on that node will fix the problem.
Failed to create pod sandbox: rpc - error getting ClusterInformation connection is unauthorized: Unauthorized
#5712
Expected Behavior
Token_watch.go shouldn't miss the token refresh in the specified interval and if it misses it, it should at least generate some more errors.
Current Behavior
Rare but it keep occurring that on some worker node the token_watch.go will stop.
Possible Solution
Steps to Reproduce (for bugs)
Randomly happening and not able to reproduce the error so far.
Context
The difference that we noticed between working and non-working worker node is below. The logs are few days old but what we noticed, the watcher stop and it doesn't even share more details about failure.
Your Environment
The text was updated successfully, but these errors were encountered: