Move orphaned Pod deletion logic to PodGC #32495

gmarek · 2016-09-12T14:48:35Z

This change is

wojtek-t · 2016-09-13T14:42:00Z

contrib/mesos/pkg/controllermanager/controllermanager.go

-	if s.TerminatedPodGCThreshold > 0 {
-		go podgc.New(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), s.resyncPeriod, int(s.TerminatedPodGCThreshold)).
+	go func() {
+		podgc.NewFromClient(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), int(s.TerminatedPodGCThreshold)).


Why not just:
go podgc.NewFromClient(...).Run(wait.NeverStop)
?

This function isn't needed here.

No reason:)

gmarek · 2016-09-14T12:03:39Z

@k8s-bot gke test this issue: #32574

wojtek-t · 2016-09-16T09:33:41Z

pkg/controller/podgc/gc_controller.go

 func (gcc *PodGCController) Run(stop <-chan struct{}) {
-	go gcc.podStoreSyncer.Run(stop)
+	go gcc.podController.Run(stop)


This won't work. If you are using SharedInformer, you shouldn't call Run for it.
Please change it to the pattern of "internalPodInformer" - an example of it you can find in ReplicationController:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller.go#L88

wojtek-t · 2016-09-16T09:35:58Z

pkg/controller/podgc/gc_controller.go

+	}
+	terminatedPods := []*api.Pod{}
+	for _, pod := range pods {
+		if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown {


Can you please it a function isPodTerminated(*api.Pod) ?

wojtek-t · 2016-09-16T09:36:18Z

pkg/controller/podgc/gc_controller.go

@@ -118,6 +145,32 @@ func (gcc *PodGCController) gc() {
 	wait.Wait()
 }

+// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
+func (gcc *PodGCController) gcOrphaned() {
+	glog.Infof("GC'ing orphaned")


wojtek-t · 2016-09-16T09:36:47Z

pkg/controller/podgc/gc_controller.go

+		glog.Errorf("Error while listing all Pods: %v", err)
+		return
+	}
+	glog.Infof("Pods: %#v", pods)


Please remove

wojtek-t · 2016-09-16T09:37:47Z

pkg/controller/podgc/gc_controller.go

+// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
+func (gcc *PodGCController) gcOrphaned() {
+	glog.Infof("GC'ing orphaned")
+	pods, err := gcc.podStore.List(labels.Everything())


Instead of listing the same pods twice, please change it to:

list pods in gc() function

pass this []*api.Pod to both functions

k8s-bot · 2016-09-16T09:56:39Z

GCE e2e build/test passed for commit 3ae5d37.

gmarek · 2016-09-26T10:31:25Z

@k8s-bot gke test this issue: #33477

gmarek

@wojtek-t PTAL

gmarek · 2016-09-26T10:34:11Z

pkg/controller/podgc/gc_controller.go

+	}
+	terminatedPods := []*api.Pod{}
+	for _, pod := range pods {
+		if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown {


gmarek · 2016-09-26T10:34:38Z

pkg/controller/podgc/gc_controller.go

+		glog.Errorf("Error while listing all Pods: %v", err)
+		return
+	}
+	glog.Infof("Pods: %#v", pods)


gmarek · 2016-09-26T10:35:54Z

pkg/controller/podgc/gc_controller.go

+// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
+func (gcc *PodGCController) gcOrphaned() {
+	glog.Infof("GC'ing orphaned")
+	pods, err := gcc.podStore.List(labels.Everything())


gmarek · 2016-09-26T11:37:37Z

pkg/controller/podgc/gc_controller.go

 func (gcc *PodGCController) Run(stop <-chan struct{}) {
-	go gcc.podStoreSyncer.Run(stop)
+	go gcc.podController.Run(stop)


gmarek · 2016-09-26T12:22:32Z

@k8s-bot test this issue: #33434, #33477

gmarek · 2016-09-26T15:13:56Z

@k8s-bot gci test this issue: #33423

I hope this is right command...

mikedanese · 2016-09-27T00:58:04Z

Should we only Running or Pending pods that are assigned to nodes that no longer exist? Seems like it might be a good idea to keep around Failed and Succeeded pods. I'm wondering to how Jobs should interact with this.

gmarek · 2016-09-27T08:14:28Z

@mikedanese I don't know - this is a no-op PR that just moves the logic around. We can discuss it in a separate issue.

wojtek-t

Just two minor things.

wojtek-t · 2016-09-27T11:30:09Z

cmd/kube-controller-manager/app/controllermanager.go

@@ -221,8 +221,8 @@ func StartControllers(s *options.CMServer, kubeconfig *restclient.Config, stop <
 	time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))

 	if s.TerminatedPodGCThreshold > 0 {
-		go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)).
-			Run(wait.NeverStop)
+		go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(),


For consistency with all other controllers, instead of calling it NewFromInformer, can we call it NewPodGC ?

wojtek-t · 2016-09-27T11:30:31Z

cmd/kube-controller-manager/app/controllermanager.go

-		go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)).
-			Run(wait.NeverStop)
+		go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(),
+			int(s.TerminatedPodGCThreshold)).Run(wait.NeverStop)


s/restclient.../client("garbage-collector")/

wojtek-t · 2016-09-27T12:03:52Z

lgtm

k8s-ci-robot · 2016-09-27T12:04:32Z

Jenkins GCI GKE smoke e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-09-27T12:04:33Z

Jenkins Kubemark GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-09-27T12:04:34Z

Jenkins GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-09-27T12:25:37Z

Jenkins verification failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

gmarek · 2016-09-27T13:57:50Z

Fixed a typo.

gmarek · 2016-09-28T09:52:09Z

@k8s-bot unit test this issue: #32455

gmarek · 2016-09-28T09:53:46Z

@k8s-bot gke test this issue: #33617

k8s-ci-robot · 2016-09-28T10:16:26Z

Jenkins unit/integration failed for commit ce96174b8f91f50dfd2293bc8d01c125a67ef699. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

gmarek · 2016-09-28T11:59:11Z

Fixed the PR after merge of #29048 (strongly typed NodeName)

k8s-ci-robot · 2016-09-28T12:36:34Z

Jenkins GCI GCE e2e failed for commit cb0a13c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-09-28T12:43:04Z

Jenkins GKE smoke e2e failed for commit cb0a13c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

gmarek · 2016-09-28T12:53:18Z

@k8s-bot gke test this issue: #33617

gmarek · 2016-09-28T12:54:06Z

@k8s-bot gci test this issue: #33627

k8s-github-robot · 2016-09-28T13:18:00Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-09-28T13:55:46Z

Automatic merge from submit-queue

gmarek added the release-note-none Denotes a PR that doesn't merit a release note. label Sep 12, 2016

gmarek assigned wojtek-t Sep 12, 2016

googlebot added the cla: yes label Sep 12, 2016

gmarek changed the title ~~Move orphaned Pod deletion logic to PodGC~~ WIP Move orphaned Pod deletion logic to PodGC Sep 12, 2016

gmarek added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 12, 2016

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 12, 2016

gmarek force-pushed the podgc branch from e92aee2 to 6e1e97d Compare September 13, 2016 12:53

gmarek removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 13, 2016

gmarek changed the title ~~WIP Move orphaned Pod deletion logic to PodGC~~ Move orphaned Pod deletion logic to PodGC Sep 13, 2016

wojtek-t reviewed Sep 13, 2016
View reviewed changes

gmarek force-pushed the podgc branch from 6e1e97d to 0351ada Compare September 13, 2016 14:54

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2016

gmarek force-pushed the podgc branch from 0351ada to 3ae5d37 Compare September 16, 2016 09:15

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2016

wojtek-t reviewed Sep 16, 2016

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2016

gmarek force-pushed the podgc branch from 3ae5d37 to 83ac71e Compare September 26, 2016 09:01

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2016

gmarek force-pushed the podgc branch from 83ac71e to b5a3a19 Compare September 26, 2016 09:43

gmarek force-pushed the podgc branch from b5a3a19 to 2c0ba1b Compare September 26, 2016 11:37

gmarek commented Sep 26, 2016

View reviewed changes

k8s-github-robot mentioned this pull request Sep 26, 2016

[k8s.io] Downward API volume should update labels on modification [Conformance] {E2eNode Suite} #33423

Closed

wojtek-t reviewed Sep 27, 2016

View reviewed changes

gmarek force-pushed the podgc branch from 2c0ba1b to 414cab5 Compare September 27, 2016 11:59

wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2016

gmarek force-pushed the podgc branch from 414cab5 to ce96174 Compare September 27, 2016 13:17

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2016

gmarek added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2016

k8s-github-robot mentioned this pull request Sep 28, 2016

[k8s.io] Pods should support retrieving logs from the container over websockets {E2eNode Suite} #31920

Closed

gmarek mentioned this pull request Sep 28, 2016

kubernetes thinks pod is running even after the node was deleted explicitly #27882

Closed

Move orphaned Pod deletion logic to PodGC

cb0a13c

gmarek force-pushed the podgc branch from ce96174 to cb0a13c Compare September 28, 2016 11:58

gmarek added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 28, 2016

k8s-github-robot merged commit 96a7b09 into kubernetes:master Sep 28, 2016

foxish mentioned this pull request Oct 25, 2016

Proposal - Pod safety and termination guarantees #34160

Closed

Move orphaned Pod deletion logic to PodGC #32495

Move orphaned Pod deletion logic to PodGC #32495

Conversation

gmarek commented Sep 12, 2016 • edited by k8s-oncall

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarek commented Sep 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Sep 16, 2016

gmarek commented Sep 26, 2016

gmarek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarek commented Sep 26, 2016

gmarek commented Sep 26, 2016

mikedanese commented Sep 27, 2016

gmarek commented Sep 27, 2016

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Sep 27, 2016

k8s-ci-robot commented Sep 27, 2016

k8s-ci-robot commented Sep 27, 2016

k8s-ci-robot commented Sep 27, 2016

k8s-ci-robot commented Sep 27, 2016

gmarek commented Sep 27, 2016

gmarek commented Sep 28, 2016

gmarek commented Sep 28, 2016

k8s-ci-robot commented Sep 28, 2016

gmarek commented Sep 28, 2016 • edited

k8s-ci-robot commented Sep 28, 2016

k8s-ci-robot commented Sep 28, 2016

gmarek commented Sep 28, 2016

gmarek commented Sep 28, 2016

k8s-github-robot commented Sep 28, 2016

k8s-github-robot commented Sep 28, 2016

gmarek commented Sep 12, 2016 •

edited by k8s-oncall

gmarek commented Sep 28, 2016 •

edited