Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate running nodes after exiting podTemplate block #1095

Closed

Conversation

Vlatombe
Copy link
Member

@Vlatombe Vlatombe commented Dec 17, 2021

This prevents orphan agents left behind after removing a dynamic pod template. In some cases where Jenkins is restarted just after exiting a podTemplate block, agents would lose their reference to pod template and cause various exceptions such as #1045

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@jglick
Copy link
Member

jglick commented Dec 17, 2021

Seems reasonable. Doubt there would be any poor interactions with #1083 since this happens after node(POD_LABEL) {…} has exited.

@jglick
Copy link
Member

jglick commented Dec 17, 2021

KubernetesProvisioningLimits.unregister: Pod template count for runInPod-jrsz5 went below zero.

seems like a genuine regression.

@Dohbedoh
Copy link
Contributor

Dohbedoh commented Dec 20, 2021

Also wondering if maybe this requires some Queue.lock like retention strategies do. For example OnceRetentionStrategy.java.

@jglick
Copy link
Member

jglick commented Dec 20, 2021

jenkinsci/durable-task-plugin#2 (comment) 🤷

@Vlatombe
Copy link
Member Author

Right, this probably due to the required queue lock...

} catch (InterruptedException | IOException e) {
LOGGER.log(Level.WARNING, "Failed to terminate " + node.getNodeName(), e);
}
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing ALL KubernetesSlave nodes, no? In my Jenkins, I have many jobs running at once, and each job declares it's own podTemplate step. How is this not terminating all such nodes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this would need to filter based on pod template, and if (after resolving merge conflicts) this passes all existing tests then a new test would need to be written which runs two podTemplate steps in parallel and lets one finish and asserts that the other continues running.

} catch (InterruptedException | IOException e) {
LOGGER.log(Level.WARNING, "Failed to terminate " + node.getNodeName(), e);
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this would need to filter based on pod template, and if (after resolving merge conflicts) this passes all existing tests then a new test would need to be written which runs two podTemplate steps in parallel and lets one finish and asserts that the other continues running.

@Vlatombe
Copy link
Member Author

Stale, superceded by #1543 and #1553

@Vlatombe Vlatombe closed this May 21, 2024
@Vlatombe Vlatombe deleted the podtemplatestep-terminatenodes branch May 21, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants