-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes thinks pod is running even after the node was deleted explicitly #27882
Comments
Actually, the node was already gone before delete node. |
@sols1 I want to say that we need to wait before we declare the node/pods dead because of network partitioning. Forwarding to the experts for expected behaviour explanation. @dbsmith @kubernetes/goog-cluster @davidopp How long will the pod collectd-9pafi on the deleted node 192.168.78.14 be shown in running state till we consider the node is really dead and not just partitioned. |
Can you clarify what you mean by "gone"? To answer your question -- NodeController used to sync nodes against the cloud provider (treating the cloud provider as the source of truth), so "delete node" didn't really do anything unless you also deleted the VM in your cloud provider. (And in fact "delete node" wasn't necessary, because once it was deleted from the cloud provider, the NodeController would detect it and do the "delete node" itself). But for some reason I can't find that syncs from the cloud provider, only the code that deletes the node when NodeController sees it is missing from the cloud provider: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/node/nodecontroller.go#L647 In any event, I suspect that deleting the node from your cloud provider and then waiting a little bit should result in the pods being evicted and the node being deleted for real. If you're running on bare metal, kill the kubelet and then run "kubectl delete node" @gmarek Do you know what happened to the code that I'm referring to? Am I hallucinating? |
Is kubernetes supposed to work on bare metal cluster (which is what I am doing)? In bare metal cluster there is no cloud provider. By "node was already gone" I mean that the node was switched off to another cluster (and it was a while). Waiting does not help - the node and pods that were running on the node do not disappear no matter how long you wait. What do you mean by saying "kill the kubelet"? Where? The node was gone already. Why is this P3? If kubernetes does not support bare metal clusters, then you should state it in the docs very clearly. |
Kubernetes supports bare metal clusters, hence my comment
If the node is no longer in touch with the master (killing the kubelet process such that it doesn't restart is one way to do that; shutting off the machine is another) and you do "kubectl delete node" then the node should be deleted from the master's state and the pods evicted. If that's not happening, it's a bug. As a starting point, can you run "kubectl describe pod" and "kubectl get pod -o yaml" on the pods that the master says are still running on the node that's gone (I guess collectd-9pafi is one), and post the output here? Thanks |
You shouldn't need to delete the node, though. When kubelet stops checking On Sun, Jun 26, 2016 at 10:35 PM, David Oppenheimer <
|
Why do I need to do Yes, if I do I cannot do Do you actually test kubernetes on bare metal clusters? |
Yes, sorry, I mis-read the issue. As @thockin alluded to, the pods should be evicted once the node stops heartbeating for 5 minutes (actually it's 5m40s). You don't need to delete the node. The behavior is the same on bare metal and cloud. The next time you see this, please send the information about the pod that I described in my previous comment, as it will help us debug. Also do the same (describe and get -o yaml) for the node that the pod claims to be running on would be very helpful too. |
I once met the similar situation in my env. I thought it was the time setting of the env then, so I ignored the issue and reset my VM. I will try to reproduce the issue. |
3 nodes up:
14 shuts down:
1 hour later the pods are still "running" on 14:
Even after
|
Sorry for dropping it so late, but it completely fell off my radar. What both @davidopp and @thockin wrote is right. Kuberentes should delete pods from NotReady Nodes after 5 minutes, with the exception of Daemons - isn't collectd a pod which is a part of daemon controller? If so, then sadly this is expected and was discussed a couple of times (@mikedanese who added this behavior). We probably should have the logic that deletes daemons from nonexisting Nodes somewhere, and the lack of it is a bug. Because it was decided that NC shouldn't touch Daemons I guess it's the responsibility of the DaemonController. As for the 'automatically deleting Nodes in on-prem setups' - that we consciously not do. It's impossible to tell if non-responsive Node is really gone, or just have some temporary problems - everything we can say is that we can't hear from it. Theoretically we could have some heuristic which determines that if the cluster size is X and the Node is unresponsive for more than Y it's probably not coming back, but it'll cause more problems than it would solve. Again - sorry for late answer. @davidopp please triage Daemon deletion from non-existing Nodes problem. |
@sols1 - did my explanation is enough for you? |
@gmarek If what you are saying is true, then k8s documentation should say clearly and directly that dead nodes must be removed manually in bare metal k8s clusters. For DaemonSets this is a bug, isn't it? |
cc @devin-donnelly for the former It turned out that there were two different places in which we were removing orphaned Pods. One was omitting DaemonSets, other one probably wasn't. There's a PR (#32495) that cleans up this mess - it might help with your case as well. |
I believe this is still Issue. Just tested this, when I shutdown the VM (I am running CoreOS on VMWare) My Pods on that Node (in NotReady state) still show Running and the after 10min it is not yet re-created at a different Node (Pods are create from a ReplicationController)
Could this be caused by Network Partitioning, If so, how can I test? I don't VMWare too well |
@gmarek Thank for the reply and heads-up |
Maybe this is a possible workaround: #65936 |
Is there is a way to reduce the 5 mins time of heartbeat? |
The text was updated successfully, but these errors were encountered: