Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ElasticJob support #465

Closed
wants to merge 1 commit into from

Conversation

kannon92
Copy link
Contributor

Fixes #463

Elastic JobSet provides a way to downscale/upscale replicated jobs. The goal of this PR is to support ElasticJobSet sets.

In order to support this, we need to relax our validation of ReplicatedJobs.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kannon92
Once this PR has been reviewed and has the lgtm label, please assign danielvegamyhre for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 21, 2024
Copy link

netlify bot commented Mar 21, 2024

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit 9409520
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/65fc9bae804edc0008340184

@kannon92
Copy link
Contributor Author

/hold

tests are failing and we should discuss validation for ReplicatedJobs in more detail.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2024
@k8s-ci-robot
Copy link
Contributor

@kannon92: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-jobset-test-integration-main 9409520 link true /test pull-jobset-test-integration-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -178,8 +178,7 @@ func (js *JobSet) ValidateUpdate(old runtime.Object) (admission.Warnings, error)
}
}
// Note that SucccessPolicy and failurePolicy are made immutable via CEL.
errs := apivalidation.ValidateImmutableField(mungedSpec.ReplicatedJobs, oldJS.Spec.ReplicatedJobs, field.NewPath("spec").Child("replicatedJobs"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahg-g this is probably not what we want.

If we drop all validation of a replicated job, updates could add/remove entire replicated jobs.

I think we wouldn't want that.

},
},
{
name: "dropping a replicated job is valid",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahg-g @danielvegamyhre added this here so we can discuss what kind of validation do we want on updates for replicated jobs.

I think this is probably not what we want.. OTOH if a jobset is suspended maybe someone can drop/add replicated jobs..

@@ -1125,6 +1125,25 @@ var _ = ginkgo.Describe("JobSet controller", func() {
},
},
}),
ginkgo.Entry("elastic jobset; scale up replicated jobs", &testCase{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this isn't working yet. Once I figure out validation, I'll come back to this.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kannon92
Copy link
Contributor Author

kannon92 commented Apr 3, 2024

/close

@k8s-ci-robot
Copy link
Contributor

@kannon92: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Elastic JobSets
2 participants