kubernetes thinks pod is running even after the node was deleted explicitly #27882

sols1 · 2016-06-22T17:21:53Z

kubectl delete node 192.168.78.14

kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-9pafi                      1/1       Running   0          23h       192.168.78.14
default       collectd-3dslw                      1/1       Running   0          23h       192.168.78.15
default       collectd-ja6p7                      1/1       Running   0          23h       192.168.78.16
default       graphite-zruml                      1/1       Running   0          1d        192.168.78.15
default       ha-service-loadbalancer-a7ssn       1/1       Running   0          1d        192.168.78.15
default       ha-service-loadbalancer-k3hq2       1/1       Running   0          1d        192.168.78.16
kube-system   kube-dns-v11-4qoi8                  4/4       Running   0          1d        192.168.78.16
kube-system   kube-registry-v0-69k0f              1/1       Running   0          1d        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-cwg7k   1/1       Running   0          1d        192.168.78.15

The text was updated successfully, but these errors were encountered:

sols1 · 2016-06-22T17:51:38Z

Actually, the node was already gone before delete node.

girishkalele · 2016-06-22T20:18:40Z

@sols1 I want to say that we need to wait before we declare the node/pods dead because of network partitioning. Forwarding to the experts for expected behaviour explanation.

@dbsmith @kubernetes/goog-cluster @davidopp

How long will the pod collectd-9pafi on the deleted node 192.168.78.14 be shown in running state till we consider the node is really dead and not just partitioned.

davidopp · 2016-06-27T04:57:47Z

Actually, the node was already gone before delete node.

Can you clarify what you mean by "gone"?

To answer your question -- NodeController used to sync nodes against the cloud provider (treating the cloud provider as the source of truth), so "delete node" didn't really do anything unless you also deleted the VM in your cloud provider. (And in fact "delete node" wasn't necessary, because once it was deleted from the cloud provider, the NodeController would detect it and do the "delete node" itself). But for some reason I can't find that syncs from the cloud provider, only the code that deletes the node when NodeController sees it is missing from the cloud provider: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/node/nodecontroller.go#L647

In any event, I suspect that deleting the node from your cloud provider and then waiting a little bit should result in the pods being evicted and the node being deleted for real.

If you're running on bare metal, kill the kubelet and then run "kubectl delete node"

@gmarek Do you know what happened to the code that I'm referring to? Am I hallucinating?

sols1 · 2016-06-27T05:28:19Z

Is kubernetes supposed to work on bare metal cluster (which is what I am doing)?

In bare metal cluster there is no cloud provider.

By "node was already gone" I mean that the node was switched off to another cluster (and it was a while).

Waiting does not help - the node and pods that were running on the node do not disappear no matter how long you wait.

What do you mean by saying "kill the kubelet"? Where? The node was gone already.

Why is this P3? If kubernetes does not support bare metal clusters, then you should state it in the docs very clearly.

davidopp · 2016-06-27T05:34:18Z

Kubernetes supports bare metal clusters, hence my comment

If you're running on bare metal, kill the kubelet and then run "kubectl delete node"

If the node is no longer in touch with the master (killing the kubelet process such that it doesn't restart is one way to do that; shutting off the machine is another) and you do "kubectl delete node" then the node should be deleted from the master's state and the pods evicted. If that's not happening, it's a bug. As a starting point, can you run "kubectl describe pod" and "kubectl get pod -o yaml" on the pods that the master says are still running on the node that's gone (I guess collectd-9pafi is one), and post the output here?

Thanks

thockin · 2016-06-27T05:47:23Z

You shouldn't need to delete the node, though. When kubelet stops checking
in, all the pods should be deleted... It sounds to me like THAT is the
bug, here...

On Sun, Jun 26, 2016 at 10:35 PM, David Oppenheimer <
notifications@github.com> wrote:

Kubernetes supports bare metal clusters, hence my comment

If you're running on bare metal, kill the kubelet and then run "kubectl
delete node"

If the node is no longer in touch with the master (killing the kubelet
process such that it doesn't restart is one way to do that; shutting off
the machine is another) and you do "kubectl delete node" then the node
should be deleted from the master's state and the pods evicted. If that's
not happening, it's a bug. As a starting point, can you run "kubectl
describe pod" and "kubectl get pod -o yaml" on the pods that the master
says are still running on the node that's gone (I guess collectd-9pafi is
one), and post the output here?

Thanks

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#27882 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AFVgVAIlFZuE7c4H7RTNcqGw9hFLk8kJks5qP2EFgaJpZM4I8BS5
.

sols1 · 2016-06-27T05:49:30Z

Why do I need to do delete node if the node is already down? Why doesn't kubernetes itself detects such a situation and handles it correctly (just like with pods)? This looks like a bug.

Yes, if I do delete node manually, then pods that were running on the node do not disappear - it doesn't matter how long you wait. This is another bug.

I cannot do kubectl describe pod right now since it was a while but I observed this behavior multiple times. But I can do it next time I see this.

Do you actually test kubernetes on bare metal clusters?

davidopp · 2016-06-27T05:54:27Z

Yes, sorry, I mis-read the issue. As @thockin alluded to, the pods should be evicted once the node stops heartbeating for 5 minutes (actually it's 5m40s). You don't need to delete the node. The behavior is the same on bare metal and cloud.

The next time you see this, please send the information about the pod that I described in my previous comment, as it will help us debug. Also do the same (describe and get -o yaml) for the node that the pod claims to be running on would be very helpful too.

xiangpengzhao · 2016-06-27T15:34:49Z

I once met the similar situation in my env. I thought it was the time setting of the env then, so I ignored the issue and reset my VM. I will try to reproduce the issue.

sols1 · 2016-06-29T17:32:47Z

3 nodes up:

date; ~/kubectl --server=192.168.78.16:8080 get nodes
Tue Jun 28 16:21:20 PDT 2016
NAME            STATUS    AGE
192.168.78.14   Ready     42m
192.168.78.15   Ready     11d
192.168.78.16   Ready     11d

date; ~/kubectl --server=192.168.78.16:8080 get svc,ds -o wide --all-namespaces | cut -c1-175
Tue Jun 28 16:22:22 PDT 2016
NAMESPACE     NAME                       CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE       SELECTOR
default       graphite                   10.100.72.82     nodes         9093/TCP,2003/TCP,8125/UDP   11d       name=graphite
default       graphite-ext               10.100.83.96     10.10.10.20   9093/TCP                     8d        name=graphite
default       kubernetes                 10.100.0.1       <none>        443/TCP                      11d       <none>
development   dns-backend                10.100.117.111   <none>        8000/TCP                     5d        name=dns-backend
kube-system   kube-dns                   10.100.2.254     <none>        53/UDP,53/TCP                11d       k8s-app=kube-dns
kube-system   kube-registry              10.100.78.167    <none>        5000/TCP                     11d       k8s-app=kube-registry
kube-system   kubernetes-dashboard       10.100.97.204    <none>        80/TCP                       1d        k8s-app=kubernetes-dashboard
kube-system   kubernetes-dashboard-ext   10.100.190.239   10.10.10.20   9090/TCP                     1d        k8s-app=kubernetes-dashboard
production    dns-backend                10.100.70.161    <none>        8000/TCP                     4d        name=dns-backend
NAMESPACE     NAME                       DESIRED          CURRENT       NODE-SELECTOR                AGE       CONTAINER(S)            IMAGE(S)                                
default       collectd                   3                3             <none>                       4h        collectd                collectd-docker            
default       node-problem-detector      3                3             <none>                       3d        node-problem-detector   gcr.io/google_containers/node-problem-de

date; ~/kubectl --server=192.168.78.16:8080 get pods -o wide --all-namespaces
Tue Jun 28 16:22:52 PDT 2016
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-0z9ih                      1/1       Running   0          43m       192.168.78.14
default       collectd-1za4f                      1/1       Running   0          4h        192.168.78.15
default       collectd-ty56y                      1/1       Running   0          4h        192.168.78.16
default       graphite-5rr7m                      1/1       Running   0          6d        192.168.78.16
default       ha-service-loadbalancer-83c8n       1/1       Running   0          23h       192.168.78.15
default       ha-service-loadbalancer-fyihg       1/1       Running   0          23h       192.168.78.16
default       node-problem-detector-33r6c         1/1       Running   0          22h       192.168.78.14
default       node-problem-detector-bcrpv         1/1       Running   0          3d        192.168.78.15
default       node-problem-detector-i46vz         1/1       Running   0          3d        192.168.78.16
development   dns-backend-rtx9c                   1/1       Running   0          5d        192.168.78.15
kube-system   kube-dns-v11-mbkar                  4/4       Running   0          1d        192.168.78.15
kube-system   kube-registry-v0-5fly7              1/1       Running   0          6d        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-y5jel   1/1       Running   0          1d        192.168.78.16
production    dns-backend-wkt5j                   1/1       Running   0          5d        192.168.78.16

14 shuts down:

date; ~/kubectl --server=192.168.78.16:8080 get nodes
Tue Jun 28 16:33:11 PDT 2016
NAME            STATUS     AGE
192.168.78.14   NotReady   54m
192.168.78.15   Ready      11d
192.168.78.16   Ready      11d

date; ~/kubectl --server=192.168.78.16:8080 get svc,ds -o wide --all-namespaces | cut -c1-175
Tue Jun 28 16:33:31 PDT 2016
NAMESPACE     NAME                       CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE       SELECTOR
default       graphite                   10.100.72.82     nodes         9093/TCP,2003/TCP,8125/UDP   11d       name=graphite
default       graphite-ext               10.100.83.96     10.10.10.20   9093/TCP                     8d        name=graphite
default       kubernetes                 10.100.0.1       <none>        443/TCP                      11d       <none>
development   dns-backend                10.100.117.111   <none>        8000/TCP                     5d        name=dns-backend
kube-system   kube-dns                   10.100.2.254     <none>        53/UDP,53/TCP                11d       k8s-app=kube-dns
kube-system   kube-registry              10.100.78.167    <none>        5000/TCP                     11d       k8s-app=kube-registry
kube-system   kubernetes-dashboard       10.100.97.204    <none>        80/TCP                       1d        k8s-app=kubernetes-dashboard
kube-system   kubernetes-dashboard-ext   10.100.190.239   10.10.10.20   9090/TCP                     1d        k8s-app=kubernetes-dashboard
production    dns-backend                10.100.70.161    <none>        8000/TCP                     5d        name=dns-backend
NAMESPACE     NAME                       DESIRED          CURRENT       NODE-SELECTOR                AGE       CONTAINER(S)            IMAGE(S)                                
default       collectd                   3                3             <none>                       4h        collectd                collectd-docker            
default       node-problem-detector      3                3             <none>                       3d        node-problem-detector   gcr.io/google_containers/node-problem-de

date; ~/kubectl --server=192.168.78.16:8080 get pods -o wide --all-namespaces
Tue Jun 28 16:33:44 PDT 2016
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-0z9ih                      1/1       Running   0          54m       192.168.78.14
default       collectd-1za4f                      1/1       Running   0          4h        192.168.78.15
default       collectd-ty56y                      1/1       Running   0          4h        192.168.78.16
default       graphite-5rr7m                      1/1       Running   0          6d        192.168.78.16
default       ha-service-loadbalancer-83c8n       1/1       Running   0          23h       192.168.78.15
default       ha-service-loadbalancer-fyihg       1/1       Running   0          23h       192.168.78.16
default       node-problem-detector-33r6c         1/1       Running   0          22h       192.168.78.14
default       node-problem-detector-bcrpv         1/1       Running   0          3d        192.168.78.15
default       node-problem-detector-i46vz         1/1       Running   0          3d        192.168.78.16
development   dns-backend-rtx9c                   1/1       Running   0          5d        192.168.78.15
kube-system   kube-dns-v11-mbkar                  4/4       Running   0          1d        192.168.78.15
kube-system   kube-registry-v0-5fly7              1/1       Running   0          6d        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-y5jel   1/1       Running   0          1d        192.168.78.16
production    dns-backend-wkt5j                   1/1       Running   0          5d        192.168.78.16

1 hour later the pods are still "running" on 14:

date; ~/kubectl --server=192.168.78.16:8080 get nodes
Tue Jun 28 17:39:31 PDT 2016
NAME            STATUS     AGE
192.168.78.14   NotReady   2h
192.168.78.15   Ready      11d
192.168.78.16   Ready      11d

date; ~/kubectl --server=192.168.78.16:8080 get pods -o wide --all-namespaces
Tue Jun 28 17:39:37 PDT 2016
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-0z9ih                      1/1       Running   0          2h        192.168.78.14
default       collectd-1za4f                      1/1       Running   0          5h        192.168.78.15
default       collectd-ty56y                      1/1       Running   0          5h        192.168.78.16
default       graphite-5rr7m                      1/1       Running   0          6d        192.168.78.16
default       ha-service-loadbalancer-83c8n       1/1       Running   0          1d        192.168.78.15
default       ha-service-loadbalancer-fyihg       1/1       Running   0          1d        192.168.78.16
default       node-problem-detector-33r6c         1/1       Running   0          23h       192.168.78.14
default       node-problem-detector-bcrpv         1/1       Running   0          4d        192.168.78.15
default       node-problem-detector-i46vz         1/1       Running   0          4d        192.168.78.16
development   dns-backend-rtx9c                   1/1       Running   0          5d        192.168.78.15
kube-system   kube-dns-v11-mbkar                  4/4       Running   0          1d        192.168.78.15
kube-system   kube-registry-v0-5fly7              1/1       Running   0          6d        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-y5jel   1/1       Running   0          1d        192.168.78.16
production    dns-backend-wkt5j                   1/1       Running   0          5d        192.168.78.16

Even after kubectl delete node the pods are still "running":

date; ~/kubectl --server=192.168.78.16:8080 delete node 192.168.78.14
Tue Jun 28 17:40:44 PDT 2016
node "192.168.78.14" deleted

date; ~/kubectl --server=192.168.78.16:8080 get nodes
Tue Jun 28 17:40:58 PDT 2016
NAME            STATUS    AGE
192.168.78.15   Ready     11d
192.168.78.16   Ready     11d

date; ~/kubectl --server=192.168.78.16:8080 get pods -o wide --all-namespaces
Tue Jun 28 17:41:07 PDT 2016
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-0z9ih                      1/1       Running   0          2h        192.168.78.14
default       collectd-1za4f                      1/1       Running   0          5h        192.168.78.15
default       collectd-ty56y                      1/1       Running   0          5h        192.168.78.16
default       graphite-5rr7m                      1/1       Running   0          6d        192.168.78.16
default       ha-service-loadbalancer-83c8n       1/1       Running   0          1d        192.168.78.15
default       ha-service-loadbalancer-fyihg       1/1       Running   0          1d        192.168.78.16
default       node-problem-detector-33r6c         1/1       Running   0          23h       192.168.78.14
default       node-problem-detector-bcrpv         1/1       Running   0          4d        192.168.78.15
default       node-problem-detector-i46vz         1/1       Running   0          4d        192.168.78.16
development   dns-backend-rtx9c                   1/1       Running   0          5d        192.168.78.15
kube-system   kube-dns-v11-mbkar                  4/4       Running   0          1d        192.168.78.15
kube-system   kube-registry-v0-5fly7              1/1       Running   0          6d        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-y5jel   1/1       Running   0          1d        192.168.78.16
production    dns-backend-wkt5j                   1/1       Running   0          5d        192.168.78.16

sols1 · 2016-06-29T17:50:22Z

describe_node_14.txt
describe_node_14_2.txt

get_node_14.txt
get_node_14_2.txt

describe_pod_collectd-0z9ih.txt
describe_pod_collectd-0z9ih_2.txt
describe_pod_collectd-0z9ih_3.txt

get_pod_collectd-0z9ih.txt
get_pod_collectd-0z9ih_2.txt
get_pod_collectd-0z9ih_3.txt

gmarek · 2016-09-01T08:57:42Z

Sorry for dropping it so late, but it completely fell off my radar. What both @davidopp and @thockin wrote is right. Kuberentes should delete pods from NotReady Nodes after 5 minutes, with the exception of Daemons - isn't collectd a pod which is a part of daemon controller? If so, then sadly this is expected and was discussed a couple of times (@mikedanese who added this behavior). We probably should have the logic that deletes daemons from nonexisting Nodes somewhere, and the lack of it is a bug. Because it was decided that NC shouldn't touch Daemons I guess it's the responsibility of the DaemonController.

As for the 'automatically deleting Nodes in on-prem setups' - that we consciously not do. It's impossible to tell if non-responsive Node is really gone, or just have some temporary problems - everything we can say is that we can't hear from it. Theoretically we could have some heuristic which determines that if the cluster size is X and the Node is unresponsive for more than Y it's probably not coming back, but it'll cause more problems than it would solve.

Again - sorry for late answer. @davidopp please triage Daemon deletion from non-existing Nodes problem.

gmarek · 2016-09-08T13:57:42Z

@sols1 - did my explanation is enough for you?

sols1 · 2016-09-27T23:54:30Z

@gmarek If what you are saying is true, then k8s documentation should say clearly and directly that dead nodes must be removed manually in bare metal k8s clusters.

For DaemonSets this is a bug, isn't it?

gmarek · 2016-09-28T09:28:44Z

cc @devin-donnelly for the former

It turned out that there were two different places in which we were removing orphaned Pods. One was omitting DaemonSets, other one probably wasn't. There's a PR (#32495) that cleans up this mess - it might help with your case as well.

mtbbiker · 2017-02-03T07:39:46Z

I believe this is still Issue. Just tested this, when I shutdown the VM (I am running CoreOS on VMWare)

My Pods on that Node (in NotReady state) still show Running and the after 10min it is not yet re-created at a different Node (Pods are create from a ReplicationController)

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:22:15Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+coreos.0", GitCommit:"3ed7d0f453a5517245d32a9c57c39b946e578821", GitTreeState:"clean", BuildDate:"2017-01-13T00:23:19Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Could this be caused by Network Partitioning, If so, how can I test? I don't VMWare too well

gmarek · 2017-02-06T15:10:16Z

@mtbbiker - I replied on the second thread (#8335). Please add xrefs instead of sending the same message on multiple threads.

mtbbiker · 2017-02-08T09:58:56Z

@gmarek Thank for the reply and heads-up

jomeier · 2018-07-07T07:20:11Z

Maybe this is a possible workaround: #65936

rasberrypie · 2020-05-01T15:18:17Z

Is there is a way to reduce the 5 mins time of heartbeat?

davidopp added the team/control-plane label Jun 27, 2016

davidopp assigned gmarek and davidopp Jun 27, 2016

davidopp added kind/support Categorizes issue or PR as a support question. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jun 27, 2016

gmarek mentioned this issue Sep 9, 2016

Pods are deleted across node upgrade #32323

Closed

davidopp added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 5, 2017

gmarek closed this as completed May 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes thinks pod is running even after the node was deleted explicitly #27882

kubernetes thinks pod is running even after the node was deleted explicitly #27882

sols1 commented Jun 22, 2016

sols1 commented Jun 22, 2016

girishkalele commented Jun 22, 2016

davidopp commented Jun 27, 2016

sols1 commented Jun 27, 2016

davidopp commented Jun 27, 2016

thockin commented Jun 27, 2016

sols1 commented Jun 27, 2016

davidopp commented Jun 27, 2016

xiangpengzhao commented Jun 27, 2016

sols1 commented Jun 29, 2016

sols1 commented Jun 29, 2016

gmarek commented Sep 1, 2016

gmarek commented Sep 8, 2016

sols1 commented Sep 27, 2016

gmarek commented Sep 28, 2016

mtbbiker commented Feb 3, 2017

gmarek commented Feb 6, 2017

mtbbiker commented Feb 8, 2017

jomeier commented Jul 7, 2018

rasberrypie commented May 1, 2020

kubernetes thinks pod is running even after the node was deleted explicitly #27882

kubernetes thinks pod is running even after the node was deleted explicitly #27882

Comments

sols1 commented Jun 22, 2016

sols1 commented Jun 22, 2016

girishkalele commented Jun 22, 2016

davidopp commented Jun 27, 2016

sols1 commented Jun 27, 2016

davidopp commented Jun 27, 2016

thockin commented Jun 27, 2016

sols1 commented Jun 27, 2016

davidopp commented Jun 27, 2016

xiangpengzhao commented Jun 27, 2016

sols1 commented Jun 29, 2016

sols1 commented Jun 29, 2016

gmarek commented Sep 1, 2016

gmarek commented Sep 8, 2016

sols1 commented Sep 27, 2016

gmarek commented Sep 28, 2016

mtbbiker commented Feb 3, 2017

gmarek commented Feb 6, 2017

mtbbiker commented Feb 8, 2017

jomeier commented Jul 7, 2018

rasberrypie commented May 1, 2020