Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable default paging in list watches #51876

Merged

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Sep 3, 2017

For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the chunking terminology.

Note that the pager may be used for other things besides chunking.

Follow on to #48921, we left the field on to get some exercise in the normal code paths, but needs to be disabled for 1.8.

@liggitt let's merge on wednesday.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 3, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Sep 3, 2017
@smarterclayton smarterclayton added this to the v1.8 milestone Sep 3, 2017
@smarterclayton smarterclayton added kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Sep 3, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 3, 2017
@smarterclayton smarterclayton added release-note-none Denotes a PR that doesn't merit a release note. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 3, 2017
@smarterclayton
Copy link
Contributor Author

/approve no-issue

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 3, 2017
@wojtek-t
Copy link
Member

wojtek-t commented Sep 4, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 4, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@dims
Copy link
Member

dims commented Sep 4, 2017

/test all

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@dims
Copy link
Member

dims commented Sep 4, 2017

Looks like a legit verify failure - FAILED hack/make-rules/../../hack/verify-bazel.sh 20s

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@smarterclayton smarterclayton added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 5, 2017
@smarterclayton
Copy link
Contributor Author

Still set to DNM because we're gathering data

@smarterclayton
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2017
@smarterclayton
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

@smarterclayton
Copy link
Contributor Author

Metrics from the two runs:

Before

I0905 22:34:40.939]   /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scalability/load.go:103
I0905 22:34:41.038] Sep  5 22:34:41.037: INFO: Top latency metric: {Resource:pods Subresource: Verb:LIST Latency:{Perc50:9.261ms Perc90:11.478ms Perc99:48.501ms Perc100:0s} Count:9840}
I0905 22:34:41.038] Sep  5 22:34:41.037: INFO: Top latency metric: {Resource:services Subresource: Verb:DELETE Latency:{Perc50:12.146ms Perc90:15.141ms Perc99:35.373ms Perc100:0s} Count:821}
I0905 22:34:41.038] Sep  5 22:34:41.037: INFO: Top latency metric: {Resource:nodes Subresource: Verb:LIST Latency:{Perc50:23.318ms Perc90:25.504ms Perc99:27.916ms Perc100:0s} Count:46}
I0905 22:34:41.038] Sep  5 22:34:41.037: INFO: Top latency metric: {Resource:services Subresource: Verb:POST Latency:{Perc50:3.721ms Perc90:4.891ms Perc99:18.715ms Perc100:0s} Count:821}
I0905 22:34:41.039] Sep  5 22:34:41.037: INFO: Top latency metric: {Resource:services Subresource: Verb:LIST Latency:{Perc50:15.635ms Perc90:16.778ms Perc99:16.778ms Perc100:0s} Count:10}
I0905 22:34:41.039] Sep  5 22:34:41.037: INFO: Printing summary: APIResponsiveness

After

I0906 04:26:59.107]   /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scalability/load.go:103
I0906 04:26:59.211] Sep  6 04:26:59.211: INFO: Top latency metric: {Resource:nodes Subresource: Verb:LIST Latency:{Perc50:24.351ms Perc90:28.475ms Perc99:56.18ms Perc100:0s} Count:48}
I0906 04:26:59.212] Sep  6 04:26:59.211: INFO: Top latency metric: {Resource:pods Subresource: Verb:LIST Latency:{Perc50:10.018ms Perc90:12.068ms Perc99:49.82ms Perc100:0s} Count:9840}
I0906 04:26:59.212] Sep  6 04:26:59.211: INFO: Top latency metric: {Resource:services Subresource: Verb:DELETE Latency:{Perc50:12.294ms Perc90:15.457ms Perc99:37.681ms Perc100:0s} Count:821}
I0906 04:26:59.212] Sep  6 04:26:59.211: INFO: Top latency metric: {Resource:replicationcontrollers Subresource: Verb:LIST Latency:{Perc50:22.8ms Perc90:22.8ms Perc99:22.8ms Perc100:0s} Count:2}
I0906 04:26:59.212] Sep  6 04:26:59.211: INFO: Top latency metric: {Resource:services Subresource: Verb:LIST Latency:{Perc50:15.143ms Perc90:20.35ms Perc99:20.35ms Perc100:0s} Count:10}
I0906 04:26:59.213] Sep  6 04:26:59.211: INFO: Printing summary: APIResponsiveness

@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2017

Slightly higher, but the difference isn't large. I think at that level of latencies (small tens of ms), that may be expected. WDYT?

@smarterclayton
Copy link
Contributor Author

I was expecting to see a drop (since at the apiserver a paged list should be proportionally faster) on at least one of the high N resource types. Pods would be most likely. However, pods are likely dominated by node list watches, not by master list watches, so pod tail latency should go down.

@smarterclayton
Copy link
Contributor Author

What is the max resource collection size on the cluster? I.e. how big do pods get at any one time? 1k? 9k?

@smarterclayton
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

@smarterclayton
Copy link
Contributor Author

Ok, so with the second run a number of mutation operations had lower tail latencies. This would be expected when there are conflicting reads and writes (in etcd3 at the moment there are a few range locks that large range reads can take that block writes). However, paging wasn't happening frequently enough in the run to tell one way or another, because in practice this test doesn't ever reLIST - all caches start empty and are fed by watches. So I'm going to say we need a better test scenario for this before we can say one way or another.

Going to drop the last commit and get everything green, then reapply label to disable paging on the client side (as the PR originally mentions).

For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.

Note that the pager may be used for other things besides chunking.
@wojtek-t
Copy link
Member

wojtek-t commented Sep 7, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wojtek-t

Associated issue: 48921

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@smarterclayton smarterclayton removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Sep 7, 2017
@smarterclayton
Copy link
Contributor Author

/retest

@lavalamp
Copy link
Member

lavalamp commented Sep 7, 2017

assign @jpbetz

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

WatchFunc WatchFunc
// DisableChunking requests no chunking for this list watcher. It has no effect in Kubernetes 1.8, but in
// 1.9 will allow a controller to opt out of chunking.
DisableChunking bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be named in the positive, especially since that would get the default behavior you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was in 1.9 to remove the false && below. All of the test suites are already configured to bypass chunking by setting this flag, whereas all clients would still be opt out for beta. I wanted to avoid pointer insanity in internal code as well as bad globals (since this is core library code).

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 48552, 51876)

@k8s-github-robot k8s-github-robot merged commit eda3db5 into kubernetes:master Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants