Investigate using Elastic Index Job in exclusive placement implementation #482

danielvegamyhre · 2024-03-27T20:07:51Z

Right now, the implementation of exclusive job placement per topology domain (node pool, zone, etc) relies on a pod webhook which allows leader pods (index 0) to be admitted, created, and scheduled, but blocks follower pods (all other indexes) until the leader pod for that child Job is scheduled.

The leader pods have pod affinity/anti-affinity constraints ensuring each leader pod lands in a different topology. Follower pods have nodeSelectors injected by the pod mutating webhook ensuring the land on the same topology as their leader.

This is an improvement over the original implementation of using pod affinity/anti-affinity constraints on all pods (which did not scale well due to the pod affinity rule computation time scaling linearly with the number of pods and nodes). However, the repeated wasted follower pod creation attempts are putting unnecessary pressure on the apiserver.

One possible option is to use Elastic Indexed Jobs to first create every Job with completions == parallelism == 1 (so this will only create index 0 / leader pods). The pod webhook will still inject pod affinity/anti-affinity constraints into the leader pods as it currently does.

Once a leader pod for a given Job is scheduled, resize the Job to have completions == parallelism == . The follower pods will then be created and have the nodeSelector injected to follow the leader, as it currently does. This will minimize pressure on the apiserver by avoiding unnecessary pod creation attempts.

kannon92 · 2024-03-28T13:25:20Z

#463

I did look into supporting elastic jobs with jobset. I opened up a PR but I found out that we enforce immutability on all replicated jobs.

I'm still waiting for some comments on what kind of validation we would want for replicated jobs.

danielvegamyhre · 2024-03-28T16:23:20Z

#463

I did look into supporting elastic jobs with jobset. I opened up a PR but I found out that we enforce immutability on all replicated jobs.

I'm still waiting for some comments on what kind of validation we would want for replicated jobs.

We don't need to mutate replicatedJobs for this issue, the controller would be performing this logic on individual Jobs.

The thing you're exploring is different (elastic replicated jobs, enabling number of replicas to scale up or down). This issue is scaling the number of Job pods up after creation, which doesn't require mutating the replicatedJob.

kannon92 · 2024-03-28T16:28:34Z

Yikes.. Thats a good distinction but I'm not sure we want to support that in that way..

I guess the main things you'd want to see is how that does break jobset status counting. JobSet is going to converge to what values are in its spec. If users patch the underlying jobs then I wonder what happens with the jobset statuses.

I think this is why deployment/statefulset encourage users to use scale on those applications rather than editing the pods directly.

danielvegamyhre · 2024-03-28T16:33:03Z

Yikes.. Thats a good distinction but I'm not sure we want to support that in that way..

I guess the main things you'd want to see is how that does break jobset status counting. JobSet is going to converge to what values are in its spec. If users patch the underlying jobs then I wonder what happens with the jobset statuses.

I think this is why deployment/statefulset encourage users to use scale on those applications rather than editing the pods directly.

It's not the user patching the jobs, it is the JobSet controller.

It creates each child Job with 1 pod (leader pod) and once it's scheduled, updates the child Job completions and parallelism to match what's in the JobSet spec. This way we avoid spamming the apiserver with pod creating requests that we know will be rejected, job controller will undergo exponential backoff up if the leader pod takes too long to schedule, delaying the time for all the workers to be ready.

Regardless, this is just an issue to prototype it and see how it performs versus the current webhook based implementation.

kannon92 · 2024-03-28T16:40:24Z

Okay. I was just thinking that this would make reproducibility a bit challenging as the spec is not the desired state.

but for a prototype I think its worth exploring

dejanzele · 2024-04-25T08:19:22Z

What is the current state of this work? I see it is unassigned, if nobody is actively working on it, I am interested in picking it up.

danielvegamyhre · 2024-04-25T15:34:39Z

@dejanzele That would be great!

Let me know if you have any questions on the idea or the implementation.

dejanzele · 2024-04-25T17:32:30Z

Thanks, I'll get familiar and ping you soon

/assign

danielvegamyhre · 2024-04-29T19:09:45Z

/kind feature

danielvegamyhre mentioned this issue Apr 16, 2024

Release v0.6.0 requirements #523

Open

10 tasks

k8s-ci-robot assigned dejanzele Apr 25, 2024

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using Elastic Index Job in exclusive placement implementation #482

Investigate using Elastic Index Job in exclusive placement implementation #482

danielvegamyhre commented Mar 27, 2024

kannon92 commented Mar 28, 2024

danielvegamyhre commented Mar 28, 2024

kannon92 commented Mar 28, 2024

danielvegamyhre commented Mar 28, 2024 •

edited

kannon92 commented Mar 28, 2024

dejanzele commented Apr 25, 2024

danielvegamyhre commented Apr 25, 2024

dejanzele commented Apr 25, 2024

danielvegamyhre commented Apr 29, 2024

Investigate using Elastic Index Job in exclusive placement implementation #482

Investigate using Elastic Index Job in exclusive placement implementation #482

Comments

danielvegamyhre commented Mar 27, 2024

kannon92 commented Mar 28, 2024

danielvegamyhre commented Mar 28, 2024

kannon92 commented Mar 28, 2024

danielvegamyhre commented Mar 28, 2024 • edited

kannon92 commented Mar 28, 2024

dejanzele commented Apr 25, 2024

danielvegamyhre commented Apr 25, 2024

dejanzele commented Apr 25, 2024

danielvegamyhre commented Apr 29, 2024

danielvegamyhre commented Mar 28, 2024 •

edited