Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move orphaned Pod deletion logic to PodGC #32495

Merged
merged 1 commit into from Sep 28, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Sep 12, 2016

cc @mwielgus @mikedanese @davidopp


This change is Reviewable

@gmarek gmarek added the release-note-none Denotes a PR that doesn't merit a release note. label Sep 12, 2016
@gmarek gmarek changed the title Move orphaned Pod deletion logic to PodGC WIP Move orphaned Pod deletion logic to PodGC Sep 12, 2016
@gmarek gmarek added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 12, 2016
@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 12, 2016
@gmarek gmarek removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 13, 2016
@gmarek gmarek changed the title WIP Move orphaned Pod deletion logic to PodGC Move orphaned Pod deletion logic to PodGC Sep 13, 2016
if s.TerminatedPodGCThreshold > 0 {
go podgc.New(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), s.resyncPeriod, int(s.TerminatedPodGCThreshold)).
go func() {
podgc.NewFromClient(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), int(s.TerminatedPodGCThreshold)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just:
go podgc.NewFromClient(...).Run(wait.NeverStop)
?

This function isn't needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason:)

@gmarek
Copy link
Contributor Author

gmarek commented Sep 14, 2016

@k8s-bot gke test this issue: #32574

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2016
func (gcc *PodGCController) Run(stop <-chan struct{}) {
go gcc.podStoreSyncer.Run(stop)
go gcc.podController.Run(stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. If you are using SharedInformer, you shouldn't call Run for it.
Please change it to the pattern of "internalPodInformer" - an example of it you can find in ReplicationController:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller.go#L88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
terminatedPods := []*api.Pod{}
for _, pod := range pods {
if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please it a function isPodTerminated(*api.Pod) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -118,6 +145,32 @@ func (gcc *PodGCController) gc() {
wait.Wait()
}

// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
func (gcc *PodGCController) gcOrphaned() {
glog.Infof("GC'ing orphaned")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(4)

glog.Errorf("Error while listing all Pods: %v", err)
return
}
glog.Infof("Pods: %#v", pods)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
func (gcc *PodGCController) gcOrphaned() {
glog.Infof("GC'ing orphaned")
pods, err := gcc.podStore.List(labels.Everything())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of listing the same pods twice, please change it to:

  • list pods in gc() function
  • pass this []*api.Pod to both functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-bot
Copy link

k8s-bot commented Sep 16, 2016

GCE e2e build/test passed for commit 3ae5d37.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Sep 26, 2016

@k8s-bot gke test this issue: #33477

Copy link
Contributor Author

@gmarek gmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t PTAL

}
terminatedPods := []*api.Pod{}
for _, pod := range pods {
if phase := pod.Status.Phase; phase != api.PodPending && phase != api.PodRunning && phase != api.PodUnknown {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

glog.Errorf("Error while listing all Pods: %v", err)
return
}
glog.Infof("Pods: %#v", pods)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// cleanupOrphanedPods deletes pods that are bound to nodes that don't exist.
func (gcc *PodGCController) gcOrphaned() {
glog.Infof("GC'ing orphaned")
pods, err := gcc.podStore.List(labels.Everything())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

func (gcc *PodGCController) Run(stop <-chan struct{}) {
go gcc.podStoreSyncer.Run(stop)
go gcc.podController.Run(stop)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gmarek
Copy link
Contributor Author

gmarek commented Sep 26, 2016

@k8s-bot test this issue: #33434, #33477

@gmarek
Copy link
Contributor Author

gmarek commented Sep 26, 2016

@k8s-bot gci test this issue: #33423

I hope this is right command...

@mikedanese
Copy link
Member

Should we only Running or Pending pods that are assigned to nodes that no longer exist? Seems like it might be a good idea to keep around Failed and Succeeded pods. I'm wondering to how Jobs should interact with this.

@gmarek
Copy link
Contributor Author

gmarek commented Sep 27, 2016

@mikedanese I don't know - this is a no-op PR that just moves the logic around. We can discuss it in a separate issue.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor things.

@@ -221,8 +221,8 @@ func StartControllers(s *options.CMServer, kubeconfig *restclient.Config, stop <
time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))

if s.TerminatedPodGCThreshold > 0 {
go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)).
Run(wait.NeverStop)
go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with all other controllers, instead of calling it NewFromInformer, can we call it NewPodGC ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

go podgc.New(client("pod-garbage-collector"), ResyncPeriod(s), int(s.TerminatedPodGCThreshold)).
Run(wait.NeverStop)
go podgc.NewFromInformer(clientset.NewForConfigOrDie(restclient.AddUserAgent(kubeconfig, "pod-garbage-collector")), sharedInformers.Pods().Informer(),
int(s.TerminatedPodGCThreshold)).Run(wait.NeverStop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/restclient.../client("garbage-collector")/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@wojtek-t
Copy link
Member

lgtm

@wojtek-t wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 414cab519210f6f864ca0b726e25794df4f349ff. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Sep 27, 2016

Fixed a typo.

@gmarek
Copy link
Contributor Author

gmarek commented Sep 28, 2016

@k8s-bot unit test this issue: #32455

@gmarek
Copy link
Contributor Author

gmarek commented Sep 28, 2016

@k8s-bot gke test this issue: #33617

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit ce96174b8f91f50dfd2293bc8d01c125a67ef699. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@gmarek
Copy link
Contributor Author

gmarek commented Sep 28, 2016

Fixed the PR after merge of #29048 (strongly typed NodeName)

@gmarek gmarek added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 28, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit cb0a13c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit cb0a13c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@gmarek
Copy link
Contributor Author

gmarek commented Sep 28, 2016

@k8s-bot gke test this issue: #33617

@gmarek
Copy link
Contributor Author

gmarek commented Sep 28, 2016

@k8s-bot gci test this issue: #33627

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 96a7b09 into kubernetes:master Sep 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants