New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move orphaned Pod deletion logic to PodGC #32495
Conversation
if s.TerminatedPodGCThreshold > 0 { | ||
go podgc.New(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), s.resyncPeriod, int(s.TerminatedPodGCThreshold)). | ||
go func() { | ||
podgc.NewFromClient(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), int(s.TerminatedPodGCThreshold)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just:
go podgc.NewFromClient(...).Run(wait.NeverStop)
?
This function isn't needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason:)
func (gcc *PodGCController) Run(stop <-chan struct{}) { | ||
go gcc.podStoreSyncer.Run(stop) | ||
go gcc.podController.Run(stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work. If you are using SharedInformer, you shouldn't call Run for it.
Please change it to the pattern of "internalPodInformer" - an example of it you can find in ReplicationController:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller.go#L88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
terminatedPods := []*api.Pod{} | ||
for _, pod := range pods { | ||
if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please it a function isPodTerminated(*api.Pod) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -118,6 +145,32 @@ func (gcc *PodGCController) gc() { | |||
wait.Wait() | |||
} | |||
|
|||
// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist. | |||
func (gcc *PodGCController) gcOrphaned() { | |||
glog.Infof("GC'ing orphaned") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V(4)
glog.Errorf("Error while listing all Pods: %v", err) | ||
return | ||
} | ||
glog.Infof("Pods: %#v", pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist. | ||
func (gcc *PodGCController) gcOrphaned() { | ||
glog.Infof("GC'ing orphaned") | ||
pods, err := gcc.podStore.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of listing the same pods twice, please change it to:
- list pods in gc() function
- pass this []*api.Pod to both functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
GCE e2e build/test passed for commit 3ae5d37. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t PTAL
} | ||
terminatedPods := []*api.Pod{} | ||
for _, pod := range pods { | ||
if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
glog.Errorf("Error while listing all Pods: %v", err) | ||
return | ||
} | ||
glog.Infof("Pods: %#v", pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist. | ||
func (gcc *PodGCController) gcOrphaned() { | ||
glog.Infof("GC'ing orphaned") | ||
pods, err := gcc.podStore.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
func (gcc *PodGCController) Run(stop <-chan struct{}) { | ||
go gcc.podStoreSyncer.Run(stop) | ||
go gcc.podController.Run(stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Should we only Running or Pending pods that are assigned to nodes that no longer exist? Seems like it might be a good idea to keep around Failed and Succeeded pods. I'm wondering to how Jobs should interact with this. |
@mikedanese I don't know - this is a no-op PR that just moves the logic around. We can discuss it in a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor things.
@@ -221,8 +221,8 @@ func StartControllers(s *options.CMServer, kubeconfig *restclient.Config, stop < | |||
time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter)) | |||
|
|||
if s.TerminatedPodGCThreshold > 0 { | |||
go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)). | |||
Run(wait.NeverStop) | |||
go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with all other controllers, instead of calling it NewFromInformer, can we call it NewPodGC ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)). | ||
Run(wait.NeverStop) | ||
go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(), | ||
int(s.TerminatedPodGCThreshold)).Run(wait.NeverStop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/restclient.../client("garbage-collector")/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lgtm |
Jenkins GCI GKE smoke e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history. The magic incantation to run this job again is |
Jenkins Kubemark GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history. The magic incantation to run this job again is |
Jenkins GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history. The magic incantation to run this job again is |
Jenkins verification failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history. The magic incantation to run this job again is |
Fixed a typo. |
Jenkins unit/integration failed for commit ce96174b8f91f50dfd2293bc8d01c125a67ef699. Full PR test history. The magic incantation to run this job again is |
Fixed the PR after merge of #29048 (strongly typed NodeName) |
Jenkins GCI GCE e2e failed for commit cb0a13c. Full PR test history. The magic incantation to run this job again is |
Jenkins GKE smoke e2e failed for commit cb0a13c. Full PR test history. The magic incantation to run this job again is |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
cc @mwielgus @mikedanese @davidopp
This change is