Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete in-tree support for NVIDIA GPUs. #61498

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 0 additions & 1 deletion hack/.golint_failures
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,6 @@ pkg/kubelet/dockershim/cm
pkg/kubelet/dockershim/libdocker
pkg/kubelet/dockershim/testing
pkg/kubelet/events
pkg/kubelet/gpu
pkg/kubelet/images
pkg/kubelet/kuberuntime
pkg/kubelet/leaky
Expand Down
7 changes: 2 additions & 5 deletions pkg/apis/core/helper/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,14 +172,11 @@ func IsNativeResource(name core.ResourceName) bool {
strings.Contains(string(name), core.ResourceDefaultNamespacePrefix)
}

var overcommitBlacklist = sets.NewString(string(core.ResourceNvidiaGPU))

// IsOvercommitAllowed returns true if the resource is in the default
// namespace and not blacklisted.
// namespace and is not hugepages.
func IsOvercommitAllowed(name core.ResourceName) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used in validation. this is used in the kubelet, which is allowed to skew two versions older than the apiserver. can we verify that a 1.10-level kubelet fails in a reasonable way if you specify the alpha resource on a pod with overcommit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is used while validating the pod spec. So, from 1.11 (assuming this PR is merged), the API server will treat alpha.kubernetes.io/nvidia-gpu like any other *kubernetes.io prefixed resource (IsDefaultNamespaceResource()) and will allow pod specs with unequal requests and limits for this resource.

I would hope that when people upgrade their API server to 1.11, they won't submit any new pods using this resource.

But let's say we have a situation with API server running 1.11, kubelet running <1.11 with this alpha feature gate on, and the user submits a pod requesting this resource. In that case, the API server won't check whether requests=limits for this resource. And while assigning resources, kubelet will only look at the limits for this resource (like it does now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing to keep in mind would be that currently the scheduler would schedule a pod requesting *kubernetes.io/ prefixed resources to any node whether that node is exposing that resource or not (I don't know why this is the case). #50658 Scenario B

Once we remove the special case for alpha.kubernetes.io/nvidia-gpu, this behavior will apply to alpha.kubernetes.io/nvidia-gpu as well. So, a pod requesting alpha.kubernetes.io/nvidia-gpu could be scheduled to any node.

Note that this is not because of updating IsOvercommitAllowed() but because of removing the special case predicate from the scheduler below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mindprince mentioned, the current scheduler behavior is to ignore resource request on non-existing resources in the kubernetes.io/ domain. This would cause the pod to be scheduled on a node without that requested resource. On the node, because Kubelet also runs GeneralPredicate, it would fail the pod during admission if it is running 1.10, which is actually the desired behavior for alpha.kubernetes.io/nvidia-gpu. However, if it is running 1.11, the pod would be started without proper gpu device setup.

@mindprince has initiated the discussion on whether we want to change this scheduler behavior on kubernetes.io/ domain resources in #50658 discussion. For now, I wonder whether we want to fail loudly during validation for resource request on alpha.kubernetes.io/nvidia-gpu to make sure that any users who haven't been aware of the deprecation of Accelerators feature can get the clear signal and move to the device plugin based solution. Then maybe after one or two releases, when #50658 is fully resolved, we can remove this special validation logic. Of course, Accelerators is an alpha feature, so it is debatable whether we want to add this special logic in resource validation code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current scheduler behavior is to ignore resource request on non-existing resources in the kubernetes.io/ domain.

that doesn't seem forward compatible, does it? if a new resource comes along and is requested by a pod, only kubelets that know about and declare they satisfy that resource should be running that pod, right?

For now, I wonder whether we want to fail loudly during validation for resource request on alpha.kubernetes.io/nvidia-gpu to make sure that any users who haven't been aware of the deprecation of Accelerators feature can get the clear signal and move to the device plugin based solution.

tightening validation brings a host of issues we want to avoid. it is better to let a pod in and it sit unscheduled than to prevent API writes because of stricter validation that can disrupt cleaning up the very resources that are newly considered invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree the desired behavior should be leaving the pod pending till the requested resource showing up, which is #50658 is about. I think @mindprince is working on a change to resolve #50658 Scenario B. Agree it should be fine to leave the validation part out if both changes are merged in 1.11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt This is addressed.

return IsNativeResource(name) &&
!IsHugePageResourceName(name) &&
!overcommitBlacklist.Has(string(name))
!IsHugePageResourceName(name)
}

var standardLimitRangeTypes = sets.NewString(
Expand Down
4 changes: 0 additions & 4 deletions pkg/apis/core/helper/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -387,10 +387,6 @@ func TestIsOvercommitAllowed(t *testing.T) {
name: core.ResourceMemory,
allowed: true,
},
{
name: core.ResourceNvidiaGPU,
allowed: false,
},
{
name: HugePageResourceName(resource.MustParse("2Mi")),
allowed: false,
Expand Down
7 changes: 0 additions & 7 deletions pkg/apis/core/resource.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,6 @@ func (self *ResourceList) Pods() *resource.Quantity {
return &resource.Quantity{}
}

func (self *ResourceList) NvidiaGPU() *resource.Quantity {
if val, ok := (*self)[ResourceNvidiaGPU]; ok {
return &val
}
return &resource.Quantity{}
}

func (self *ResourceList) StorageEphemeral() *resource.Quantity {
if val, ok := (*self)[ResourceEphemeralStorage]; ok {
return &val
Expand Down
2 changes: 0 additions & 2 deletions pkg/apis/core/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -3641,8 +3641,6 @@ const (
// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
ResourceEphemeralStorage ResourceName = "ephemeral-storage"
// NVIDIA GPU, in devices. Alpha, might change: although fractional and allowing values >1, only one whole device per node is assigned.
ResourceNvidiaGPU ResourceName = "alpha.kubernetes.io/nvidia-gpu"
)

const (
Expand Down
1 change: 0 additions & 1 deletion pkg/apis/core/v1/helper/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ go_library(
"//vendor/k8s.io/apimachinery/pkg/api/resource:go_default_library",
"//vendor/k8s.io/apimachinery/pkg/labels:go_default_library",
"//vendor/k8s.io/apimachinery/pkg/selection:go_default_library",
"//vendor/k8s.io/apimachinery/pkg/util/sets:go_default_library",
"//vendor/k8s.io/apimachinery/pkg/util/validation:go_default_library",
],
)
Expand Down
8 changes: 2 additions & 6 deletions pkg/apis/core/v1/helper/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import (
"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/selection"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/apimachinery/pkg/util/validation"
"k8s.io/kubernetes/pkg/apis/core/helper"
)
Expand Down Expand Up @@ -85,14 +84,11 @@ func HugePageSizeFromResourceName(name v1.ResourceName) (resource.Quantity, erro
return resource.ParseQuantity(pageSize)
}

var overcommitBlacklist = sets.NewString(string(v1.ResourceNvidiaGPU))

// IsOvercommitAllowed returns true if the resource is in the default
// namespace and not blacklisted and is not hugepages.
// namespace and is not hugepages.
func IsOvercommitAllowed(name v1.ResourceName) bool {
return IsNativeResource(name) &&
!IsHugePageResourceName(name) &&
!overcommitBlacklist.Has(string(name))
!IsHugePageResourceName(name)
}

// Extended and Hugepages resources
Expand Down
4 changes: 0 additions & 4 deletions pkg/apis/core/v1/helper/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,6 @@ func TestIsOvercommitAllowed(t *testing.T) {
resourceName: "kubernetes.io/resource-foo",
expectVal: true,
},
{
resourceName: "alpha.kubernetes.io/nvidia-gpu",
expectVal: false,
},
{
resourceName: "hugepages-100m",
expectVal: false,
Expand Down
32 changes: 3 additions & 29 deletions pkg/apis/core/v1/helper/qos/qos_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,26 +38,13 @@ func TestGetPodQOS(t *testing.T) {
}),
expected: v1.PodQOSGuaranteed,
},
{
pod: newPod("guaranteed-with-gpu", []v1.Container{
newContainer("guaranteed", getResourceList("100m", "100Mi"), addResource("nvidia-gpu", "2", getResourceList("100m", "100Mi"))),
}),
expected: v1.PodQOSGuaranteed,
},
{
pod: newPod("guaranteed-guaranteed", []v1.Container{
newContainer("guaranteed", getResourceList("100m", "100Mi"), getResourceList("100m", "100Mi")),
newContainer("guaranteed", getResourceList("100m", "100Mi"), getResourceList("100m", "100Mi")),
}),
expected: v1.PodQOSGuaranteed,
},
{
pod: newPod("guaranteed-guaranteed-with-gpu", []v1.Container{
newContainer("guaranteed", getResourceList("100m", "100Mi"), addResource("nvidia-gpu", "2", getResourceList("100m", "100Mi"))),
newContainer("guaranteed", getResourceList("100m", "100Mi"), getResourceList("100m", "100Mi")),
}),
expected: v1.PodQOSGuaranteed,
},
{
pod: newPod("best-effort-best-effort", []v1.Container{
newContainer("best-effort", getResourceList("", ""), getResourceList("", "")),
Expand All @@ -71,29 +58,16 @@ func TestGetPodQOS(t *testing.T) {
}),
expected: v1.PodQOSBestEffort,
},
{
pod: newPod("best-effort-best-effort-with-gpu", []v1.Container{
newContainer("best-effort", getResourceList("", ""), addResource("nvidia-gpu", "2", getResourceList("", ""))),
newContainer("best-effort", getResourceList("", ""), getResourceList("", "")),
}),
expected: v1.PodQOSBestEffort,
},
{
pod: newPod("best-effort-with-gpu", []v1.Container{
newContainer("best-effort", getResourceList("", ""), addResource("nvidia-gpu", "2", getResourceList("", ""))),
}),
expected: v1.PodQOSBestEffort,
},
{
pod: newPod("best-effort-burstable", []v1.Container{
newContainer("best-effort", getResourceList("", ""), addResource("nvidia-gpu", "2", getResourceList("", ""))),
newContainer("best-effort", getResourceList("", ""), getResourceList("", "")),
newContainer("burstable", getResourceList("1", ""), getResourceList("2", "")),
}),
expected: v1.PodQOSBurstable,
},
{
pod: newPod("best-effort-guaranteed", []v1.Container{
newContainer("best-effort", getResourceList("", ""), addResource("nvidia-gpu", "2", getResourceList("", ""))),
newContainer("best-effort", getResourceList("", ""), getResourceList("", "")),
newContainer("guaranteed", getResourceList("10m", "100Mi"), getResourceList("10m", "100Mi")),
}),
expected: v1.PodQOSBurstable,
Expand Down Expand Up @@ -132,7 +106,7 @@ func TestGetPodQOS(t *testing.T) {
},
{
pod: newPod("burstable-2", []v1.Container{
newContainer("burstable", getResourceList("0", "0"), addResource("nvidia-gpu", "2", getResourceList("100m", "200Mi"))),
newContainer("burstable", getResourceList("0", "0"), getResourceList("100m", "200Mi")),
}),
expected: v1.PodQOSBurstable,
},
Expand Down
2 changes: 0 additions & 2 deletions pkg/apis/core/v1/validation/validation.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,6 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath
} else if quantity.Cmp(limitQuantity) > 0 {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be less than or equal to %s limit", resourceName)))
}
} else if resourceName == v1.ResourceNvidiaGPU {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be equal to %s request", v1.ResourceNvidiaGPU)))
}
}

Expand Down
61 changes: 5 additions & 56 deletions pkg/apis/core/v1/validation/validation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,36 +32,15 @@ func TestValidateResourceRequirements(t *testing.T) {
requirements v1.ResourceRequirements
}{
{
Name: "GPU only setting Limits",
requirements: v1.ResourceRequirements{
Limits: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
},
},
{
Name: "GPU setting Limits equals Requests",
requirements: v1.ResourceRequirements{
Limits: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
Requests: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
},
},
{
Name: "Resources with GPU with Requests",
Name: "Resources with Requests equal to Limits",
requirements: v1.ResourceRequirements{
Requests: v1.ResourceList{
v1.ResourceName(v1.ResourceCPU): resource.MustParse("10"),
v1.ResourceName(v1.ResourceMemory): resource.MustParse("10G"),
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("1"),
v1.ResourceName(v1.ResourceCPU): resource.MustParse("10"),
v1.ResourceName(v1.ResourceMemory): resource.MustParse("10G"),
},
Limits: v1.ResourceList{
v1.ResourceName(v1.ResourceCPU): resource.MustParse("10"),
v1.ResourceName(v1.ResourceMemory): resource.MustParse("10G"),
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("1"),
v1.ResourceName(v1.ResourceCPU): resource.MustParse("10"),
v1.ResourceName(v1.ResourceMemory): resource.MustParse("10G"),
},
},
},
Expand Down Expand Up @@ -111,36 +90,6 @@ func TestValidateResourceRequirements(t *testing.T) {
Name string
requirements v1.ResourceRequirements
}{
{
Name: "GPU only setting Requests",
requirements: v1.ResourceRequirements{
Requests: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
},
},
{
Name: "GPU setting Limits less than Requests",
requirements: v1.ResourceRequirements{
Limits: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
Requests: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("11"),
},
},
},
{
Name: "GPU setting Limits larger than Requests",
requirements: v1.ResourceRequirements{
Limits: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("10"),
},
Requests: v1.ResourceList{
v1.ResourceName(v1.ResourceNvidiaGPU): resource.MustParse("9"),
},
},
},
{
Name: "Resources with Requests Larger Than Limits",
requirements: v1.ResourceRequirements{
Expand Down
60 changes: 3 additions & 57 deletions pkg/apis/core/validation/validation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5042,35 +5042,16 @@ func TestValidateContainers(t *testing.T) {
TerminationMessagePolicy: "File",
},
{
Name: "resources-test-with-gpu-with-request",
Image: "image",
Resources: core.ResourceRequirements{
Requests: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("1"),
},
Limits: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("1"),
},
},
ImagePullPolicy: "IfNotPresent",
TerminationMessagePolicy: "File",
},
{
Name: "resources-test-with-gpu-without-request",
Name: "resources-test-with-request-and-limit",
Image: "image",
Resources: core.ResourceRequirements{
Requests: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
},
Limits: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("1"),
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
},
},
ImagePullPolicy: "IfNotPresent",
Expand Down Expand Up @@ -5359,41 +5340,6 @@ func TestValidateContainers(t *testing.T) {
TerminationMessagePolicy: "File",
},
},
"Resource GPU limit must match request": {
{
Name: "gpu-resource-request-limit",
Image: "image",
Resources: core.ResourceRequirements{
Requests: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("0"),
},
Limits: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("1"),
},
},
TerminationMessagePolicy: "File",
ImagePullPolicy: "IfNotPresent",
},
},
"Resource GPU invalid setting only request": {
{
Name: "gpu-resource-request-limit",
Image: "image",
Resources: core.ResourceRequirements{
Requests: core.ResourceList{
core.ResourceName(core.ResourceCPU): resource.MustParse("10"),
core.ResourceName(core.ResourceMemory): resource.MustParse("10G"),
core.ResourceName(core.ResourceNvidiaGPU): resource.MustParse("1"),
},
},
TerminationMessagePolicy: "File",
ImagePullPolicy: "IfNotPresent",
},
},
"Request limit simple invalid": {
{
Name: "abc-123",
Expand Down
11 changes: 0 additions & 11 deletions pkg/features/kube_features.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,6 @@ const (
// Note: This feature is not supported for `BestEffort` pods.
ExperimentalCriticalPodAnnotation utilfeature.Feature = "ExperimentalCriticalPodAnnotation"

// owner: @vishh
// alpha: v1.6
//
// This is deprecated and will be removed in v1.11. Use DevicePlugins instead.
//
// Enables support for GPUs as a schedulable resource.
// Only Nvidia GPUs are supported as of v1.6.
// Works only with Docker Container Runtime.
Accelerators utilfeature.Feature = "Accelerators"

// owner: @jiayingz
// beta: v1.10
//
Expand Down Expand Up @@ -296,7 +286,6 @@ var defaultKubernetesFeatureGates = map[utilfeature.Feature]utilfeature.FeatureS
DynamicKubeletConfig: {Default: false, PreRelease: utilfeature.Alpha},
ExperimentalHostUserNamespaceDefaultingGate: {Default: false, PreRelease: utilfeature.Beta},
ExperimentalCriticalPodAnnotation: {Default: false, PreRelease: utilfeature.Alpha},
Accelerators: {Default: false, PreRelease: utilfeature.Alpha},
DevicePlugins: {Default: true, PreRelease: utilfeature.Beta},
TaintBasedEvictions: {Default: false, PreRelease: utilfeature.Alpha},
RotateKubeletServerCertificate: {Default: false, PreRelease: utilfeature.Alpha},
Expand Down
4 changes: 0 additions & 4 deletions pkg/kubelet/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ go_library(
"//pkg/kubelet/envvars:go_default_library",
"//pkg/kubelet/events:go_default_library",
"//pkg/kubelet/eviction:go_default_library",
"//pkg/kubelet/gpu:go_default_library",
"//pkg/kubelet/gpu/nvidia:go_default_library",
"//pkg/kubelet/images:go_default_library",
"//pkg/kubelet/kubeletconfig:go_default_library",
"//pkg/kubelet/kuberuntime:go_default_library",
Expand Down Expand Up @@ -179,7 +177,6 @@ go_test(
"//pkg/kubelet/container:go_default_library",
"//pkg/kubelet/container/testing:go_default_library",
"//pkg/kubelet/eviction:go_default_library",
"//pkg/kubelet/gpu:go_default_library",
"//pkg/kubelet/images:go_default_library",
"//pkg/kubelet/lifecycle:go_default_library",
"//pkg/kubelet/logs:go_default_library",
Expand Down Expand Up @@ -264,7 +261,6 @@ filegroup(
"//pkg/kubelet/envvars:all-srcs",
"//pkg/kubelet/events:all-srcs",
"//pkg/kubelet/eviction:all-srcs",
"//pkg/kubelet/gpu:all-srcs",
"//pkg/kubelet/images:all-srcs",
"//pkg/kubelet/kubeletconfig:all-srcs",
"//pkg/kubelet/kuberuntime:all-srcs",
Expand Down