Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default req timeout for caching #60434

Conversation

juanvallejo
Copy link
Contributor

@juanvallejo juanvallejo commented Feb 26, 2018

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1546117

Release note:

NONE

Adds a default request timeout to requests made by the discovery client.
This prevents a command from hanging indefinitely due to one or multiple calls
to the apiserver taking longer than usual when when a --request-timeout flag value
has not been set.

cc @mfojtik @jpeeler @pmorie @kubernetes/sig-cli-pr-reviews

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 26, 2018
@hzxuzhonghu
Copy link
Member

I donot think this is needed, see staging/src/k8s.io/client-go/rest/config.go func UnversionedRESTClientFor()

@juanvallejo
Copy link
Contributor Author

@hzxuzhonghu thanks, but isn't the value from config.Timeout that is set there specified through the --request-timeout flag? This PR would default discovery client requests to a shorter amount of time if a user does not explicitly provide a request timeout value.

@hzxuzhonghu
Copy link
Member

@juanvallejo as far as i see, you set a default timeout when no timeout explicitly set. But maybe the intention of timeout=0 is to not timeout. Here you change this.

@sttts @liggitt @cheftako

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2018
@juanvallejo
Copy link
Contributor Author

cc @sttts

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch from 0b1e8d6 to f09b6bd Compare March 15, 2018 15:04
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 15, 2018
@@ -41,6 +42,8 @@ const (
defaultRetries = 2
// protobuf mime type
mimePb = "application/com.github.proto-openapi.spec.v2@v1.0+protobuf"
// defaultRetries is the number of times a resource discovery is repeated if an api group disappears on the fly (e.g. ThirdPartyResources).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc is wrong . It's not retries

@deads2k
Copy link
Contributor

deads2k commented Mar 15, 2018

/ok-to-test

I like this in principle. We need to think about what a default should be. 32 seconds (make sure you're different than something else's timeout) to start is a lot better than infinity, but long enough to avoid failing "normal" levels of latency.

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 15, 2018
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but you need to fix the nits David pointed out

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch from f09b6bd to 6c752f2 Compare March 16, 2018 14:12
@juanvallejo
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch from 6c752f2 to df2673e Compare March 16, 2018 17:23
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 16, 2018
@juanvallejo
Copy link
Contributor Author

@soltysh @deads2k thanks, review comments addressed

@@ -170,7 +187,7 @@ func (d *DiscoveryClient) ServerResourcesForGroupVersion(groupVersion string) (r
resources = &metav1.APIResourceList{
GroupVersion: groupVersion,
}
err = d.restClient.Get().AbsPath(url.String()).Do().Into(resources)
err = d.restClient.Get().Timeout(timeout).AbsPath(url.String()).Do().Into(resources)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, isn't this the wrong spot. You need to set it for the context timeout/deadline, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Thanks, I believe the pieces needed to set a context timeout/deadline are being added in https://github.com/kubernetes/kubernetes/pull/53076/files#diff-063e73bab84834a4187b1ad4865050adR217

Can go ahead and pick the relevant pieces from that PR into this one.

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch from a5dd391 to d205291 Compare March 16, 2018 22:24
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit still ;)

// defaultTimeout is the maximum amount of time per request when no timeout has been set on a RESTClient.
// defaults to 32s in order to have a distinguishable length of time, relative to other timeouts that exist.
// we enforce a timeout for every request here in order to avoid hanging indefinitely
defaultTimeout = 12 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying 32 in the comment but I see 12 here, typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching that, will update godoc - @deads2k I lowered the time to 12s as it is still distinguishable from other timeouts, long enough to avoid failing on "normal" levels of latency, yet still not as long such that if every discovery request times out, the overall time that a command hangs is reduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long enough to avoid failing on "normal" levels of latency

the openapi docs are quite large... on low-bandwidth connections, this seems too low

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on leaving this at the previous 32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh @liggitt thanks, switched this back to 32s

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch 2 times, most recently from f54dba2 to 0cadac2 Compare March 19, 2018 14:10
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2018
@juanvallejo
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-gce

1 similar comment
@juanvallejo
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-gce

@@ -158,6 +169,12 @@ func (d *DiscoveryClient) ServerGroups() (apiGroupList *metav1.APIGroupList, err

// ServerResourcesForGroupVersion returns the supported resources for a group and version.
func (d *DiscoveryClient) ServerResourcesForGroupVersion(groupVersion string) (resources *metav1.APIResourceList, err error) {
// set the default timeout if a timeout value for this request has not already been set
timeout := d.restClient.Get().GetTimeout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interaction looks wrong

@@ -363,6 +367,19 @@ func (r *Request) Timeout(d time.Duration) *Request {
return r
}

func (r *Request) RequestTimeout(d time.Duration) *Request {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set a timeout and one isn't set, set it.

If I set a timeout and a shorter one is set, don't do anything.

If I set a timeout and a longer one is set, reset it to the shorter one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set a timeout and a longer one is set, reset it to the shorter one.

Why would you reset it to shorter one, if someone explicitly asks for longer - use it.

}

// GetTimeout retrieves the current value of the "timeout" parameter.
func (r *Request) GetTimeout() time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this isn't clear.

if r.ctx != nil {
r.ctx, _ = context.WithTimeout(r.ctx, r.requestTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you use a zero value timeout? Seems like this should be gated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you use a zero value timeout? Seems like this should be gated.

Definitely needs gating. You'll be messing with behavior you don't own otherwise. Is there a getter for the current timeout? If not, you don't get to touch it after the context is set.

@soltysh
Copy link
Contributor

soltysh commented Mar 20, 2018

/lgtm cancel
@juanvallejo we need to discuss this approach once more :)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: juanvallejo
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: krousey

Assign the PR to them by writing /assign @krousey in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juanvallejo juanvallejo force-pushed the jvallejo/add-default-discovery-timeout branch from 0cadac2 to f647aa6 Compare March 20, 2018 18:10
@jpeeler
Copy link
Contributor

jpeeler commented Apr 11, 2018

Did the sync up ever happen here? @soltysh @juanvallejo There's a downstream issue dependent on this PR.

@soltysh
Copy link
Contributor

soltysh commented Apr 17, 2018

@jpeeler I'm just looking into it...

@soltysh
Copy link
Contributor

soltysh commented Apr 17, 2018

Alternative solution in #62733

@soltysh
Copy link
Contributor

soltysh commented Apr 19, 2018

Closing in favor of the alternative solution.

@soltysh soltysh closed this Apr 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants