Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic volume limits #64154

Merged
merged 4 commits into from Jun 2, 2018

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented May 22, 2018

Implement dynamic volume limits depending on node type.

xref kubernetes/community#2051

Add Alpha support for dynamic volume limits based on node type

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 22, 2018
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels May 22, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 22, 2018
@gnufied gnufied force-pushed the impelemnt-volume-count branch 4 times, most recently from 3676b0c to 2ec50f1 Compare May 23, 2018 20:24
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 23, 2018
@gnufied gnufied changed the title Impelemnt dynamic volume limits Implement dynamic volume limits May 23, 2018
@gnufied
Copy link
Member Author

gnufied commented May 24, 2018

/retest

//
// Add support for volume plugins to report node specific
// volume limits
DynamicVolumeThreshold utilfeature.Feature = "DynamicVolumeThreshold"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the feature name. DynamicVolumeLimits? AttachableVolumeLimits (= the predicate name)? There is no "Threshold" used in implementation of the feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit is basically another word for threshold right? I just copied what was in specs. It should be pretty easy to rename the feature flag but I don't feel very strongly about this either ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use "Limit" in other places, I will vote for "limit". Also agree on "AttachableVolume" part


// a map of volume unique name and volume limit key
newVolumes := make(map[string]string)
if err := c.filterAttachableVolumes(pod.Spec.Volumes, pod.Namespace, newVolumes); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newVolumes, err = c.filterAttachableVolumes(pod.Spec.Volumes, pod.Namespace) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, now I know why it is written like that. It is because second execution of this function(line#452) adds volumes in a loop, we will have merge maps and iterate result again to do that. This saves some execution time.


instanceType, err := instances.InstanceType(context.TODO(), plugin.host.GetNodeName())
if err != nil {
glog.V(3).Infof("Failed to get instance type from AWS cloud provider")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the error

// volumes that can be attached to a node.
type VolumePluginWithAttachLimits interface {
VolumePlugin
// Return number of volumes attachable on a node from the plugin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add some description what is the return map - what's the key and what's value.
  • I'd emphasize that it's called by kubelet and it returns limit for the node where this function is called.

// - storage-limits-aws-ebs
// - storage-limits-csi-cinder
// The key should respect character limit of ResourceName type
VolumeLimitKey(spec *Spec) string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that this function is called by scheduler (or can be called by any Kubernetes component, not just kubelet).

@gnufied gnufied force-pushed the impelemnt-volume-count branch 2 times, most recently from 6da2ccb to 9a790ce Compare May 24, 2018 14:54
@derekwaynecarr
Copy link
Member

i will help review kubelet changes.

/assign @derekwaynecarr

@@ -415,6 +415,136 @@ func (c *MaxPDVolumeCountChecker) filterVolumes(volumes []v1.Volume, namespace s
return nil
}

func (c *MaxPDVolumeCountChecker) attachableLimitPredicate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jingxu97
Copy link
Contributor

cc @bsalamat Could you please help take a look at the scheduler change? Thanks!

@gnufied gnufied force-pushed the impelemnt-volume-count branch 2 times, most recently from 9fb7a39 to a5330e3 Compare May 24, 2018 18:52
@gnufied
Copy link
Member Author

gnufied commented May 24, 2018

For scheduler

/assign @bsalamat @aveshagarwal

for api and other misc. changes

/assign @liggitt

@gnufied
Copy link
Member Author

gnufied commented May 24, 2018

/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-e2e-kops-aws

@msau42
Copy link
Member

msau42 commented Jun 1, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@aveshagarwal @bsalamat @derekwaynecarr @gnufied @liggitt @msau42

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/feature: New functionality.
Help

@gnufied
Copy link
Member Author

gnufied commented Jun 1, 2018

/test pull-kubernetes-e2e-gce

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduler changes look good to me. Just a minor comment.

@@ -92,10 +92,14 @@ func IsOvercommitAllowed(name v1.ResourceName) bool {
!IsHugePageResourceName(name)
}

func IsStorageLimitResourceName(name v1.ResourceName) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to IsVolumeLimitResourceName to be consistent with the rest of the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
@msau42
Copy link
Member

msau42 commented Jun 1, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

for scheduler changes

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, derekwaynecarr, gnufied, liggitt, msau42, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 64613, 64596, 64573, 64154, 64639). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit e5686a3 into kubernetes:master Jun 2, 2018
return nil, fmt.Errorf("Expected Azure cloudprovider, got %s", cloud.ProviderName())
}

volumeLimits := map[string]int64{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: @gnufied it looks like I need to write code here to query volume limits according to azure node type , and assign new value to volumeLimits in azure_dd.go, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is correct. I did not know how we can get those limits for Azure Disk and hence I went with existing defaults.

@yujuhong
Copy link
Contributor

yujuhong commented Jun 4, 2018

This broke the node e2e alpha suite: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-alpha

Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: E0602 14:36:19.366284    1727 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /usr/local/go/src/runtime/asm_amd64.s:573
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /usr/local/go/src/runtime/panic.go:502
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /usr/local/go/src/runtime/panic.go:63
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /usr/local/go/src/runtime/signal_unix.go:388
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/volume/azure_dd/azure_dd.go:116
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:341
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:779
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:1117
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:1111
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:1099
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:324
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:73
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet_node_status.go:362
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1367
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
Jun 02 14:36:19 tmp-node-e2e-53507b65-cos-stable-63-10032-71-0 kubelet[1727]: /usr/local/go/src/runtime/asm_amd64.s:2361

@gnufied
Copy link
Member Author

gnufied commented Jun 4, 2018

@yuanying looking. ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet