Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: KEP-4381: DRA: network-attached resources #4612

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented May 2, 2024

Adding support for network-attached resources by extending the ResourceSlice with a node selector is fairly easy. The (one!) scheduler in the cluster can use that field during Filter instead of the node name.

Supporting multiple schedulers is harder and needs further work.

Adding support for network-attached resources by extending the ResourceSlice
with a node selector is fairly easy. The (one!) scheduler in the cluster
can use that field during Filter instead of the node name.

Supporting multiple schedulers is harder and needs further work.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 2, 2024
@k8s-ci-robot k8s-ci-robot requested a review from mrunalp May 2, 2024 13:12
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 2, 2024
@@ -546,6 +548,12 @@ the DRA drivers providing content for those objects. It might be possible to
support version skew (= keeping kubelet at an older version than the control
plane and the DRA drivers) in the future, but currently this is out of scope.

For network-attached resources, the DRA driver is responsible for discovering
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ResourceSlice is generated with the name <node_name>-<driver_name>-<random_string>, but if setting the NodeSelector of ResourceSlice, would it be <driver_name>-<random_string>?
https://github.com/kubernetes/kubernetes/blob/v1.30.0/pkg/kubelet/cm/dra/plugin/noderesources.go#L470

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to name those ResourceSlices would be entirely up to the driver. What matters isn't the name, only the content.

I am moving the ResourceSlice controller out of kubelet into the k8s.io/dynamic-resource-allocation package as part of kubernetes/kubernetes#124274, so drivers could reuse that (eventually - right now in that PR it doesn't support network-attached resources yet).

@MikeZappa87
Copy link

When 'Network-attached resources' is said what exactly is the scope of this? Could it be a simple veth case or is this leaning more towards virtual functions?

@pohly
Copy link
Contributor Author

pohly commented May 4, 2024

This PR is not about network hardware in a node. That kind of resource is local to a node and already covered.

What this PR adds is support for things like an IP camera (accessible through the IP network) or special devices that can be accessed through some kind of fabric (GPU via PCI switch). Those resources are not local to a node and therefore need to be handled differently.

@bart0sh bart0sh added this to Needs Reviewer in SIG Node PR Triage May 4, 2024
pohly added a commit to pohly/wg-device-management that referenced this pull request May 22, 2024
This corresponds to kubernetes/enhancements#4612. The
ResourcePool change is small. The big caveat as mentioned in the KEP update is
that multiple schedulers will not coordinate allocation of these shared
devices, so a cluster with such devices will be limited to running a single
scheduler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Needs Reviewer
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

4 participants