Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[failing] test/conformance/api/v1.TestRevisionTimeout #15089

Open
izabelacg opened this issue Apr 3, 2024 · 7 comments
Open

[failing] test/conformance/api/v1.TestRevisionTimeout #15089

izabelacg opened this issue Apr 3, 2024 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@izabelacg
Copy link
Member

izabelacg commented Apr 3, 2024

Expected Behavior

Tests passing successfully.

Actual Behavior

Consistent failures:

Test name: test/conformance/api/v1.TestRevisionTimeout/writes_first_byte_before_timeout
Repository name: serving
Testing areas: Gateway API + Contour (periodic job)

Steps to Reproduce the Problem

See: https://testgrid.k8s.io/r/knative-own-testgrid/serving#gateway-api-contour

@izabelacg izabelacg added the kind/bug Categorizes issue or PR as related to a bug. label Apr 3, 2024
@izabelacg izabelacg changed the title [flaky] test/conformance/api/v1.TestRevisionTimeout [failing] test/conformance/api/v1.TestRevisionTimeout Apr 3, 2024
@dprotaso
Copy link
Member

dprotaso commented Apr 5, 2024

/assign @izabelacg

@izabelacg
Copy link
Member Author

RevisionSpec.Timeout has changed meaning: #12634

@izabelacg
Copy link
Member Author

We need to investigate if the test needs to be updated and/or if the behaviour is actually expected. See below what I observed so far.
When curling the service with the tests parameters, we get:

$ curl -v -H "Host: revision-timeout-writes-first-byte-before-pmzegqbs.serving-tests.example.com" "$EXTERNAL_IP?timeout=15000&initialTimeout=0"
...
> GET /?timeout=15000&initialTimeout=0 HTTP/1.1
> Host: revision-timeout-writes-first-byte-before-pmzegqbs.serving-tests.example.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
< date: Tue, 09 Apr 2024 15:40:32 GMT
< x-envoy-upstream-service-time: 2
< vary: Accept-Encoding
< server: envoy
< transfer-encoding: chunked
<
* transfer closed with outstanding read data remaining
* Closing connection
curl: (18) transfer closed with outstanding read data remaining

When curling the pod directly from within the cluster, we get:

$ curl -v "10.112.5.251:8012?timeout=15000&initialTimeout=0"
*   Trying 10.112.5.251:8012...
* Connected to 10.112.5.251 (10.112.5.251) port 8012
> GET /?timeout=15000&initialTimeout=0 HTTP/1.1
> Host: 10.112.5.251:8012
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Tue, 09 Apr 2024 15:55:08 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 10.112.5.251 left intact
Slept for 15000 milliseconds

@dprotaso
Copy link
Member

Note the (net-gateway-api) changes that will land in #15136 don't address this failure.

@dprotaso
Copy link
Member

I recall discussing this issue - something is up with the test because by the definition of the Revision.Spec.TimeoutSeconds says it's the max time allowed to process a request.

In the test we have RevisionTimeout=10s but tell the app to sleep 15s. Thus it should fail but we expect a http.StatusOK in the test which seems wrong.

@dprotaso
Copy link
Member

Also disabling timeouts is broken in Contour see:

knative-extensions/net-gateway-api#711
projectcontour/contour#6373

@izabelacg
Copy link
Member Author

izabelacg commented May 2, 2024

Also disabling timeouts is broken in Contour see:

I think this explains why the test is failing ONLY for gateway-api.

In the test we have RevisionTimeout=10s but tell the app to sleep 15s. Thus it should fail but we expect a http.StatusOK in the test which seems wrong.

I'm wondering why the test is passing for contour and istio if this is not expected behaviour, since it seems correct behaviour for the timeouts has been established by #12634. I'm unsure if I misunderstood any of the timeouts, but here is what I've concluded so far:

  • TestRevisionTimeout/writes_first_byte_before_timeout was meant to make sure that once we receive any bytes from the user's application, we can set a timeout to determine how long we should wait for the response to be fully sent. So, I think this shouldn't be testing Revision.Spec.TimeoutSeconds, but rather Revision.Spec.IdleTimeoutSeconds - which is achieved by this other test:

    }, {
    name: "writes response before idle timeout",
    timeoutSeconds: 10,
    idleTimeoutSeconds: 5,
    expectedStatus: http.StatusOK,
    sleep: 2 * time.Second,
    }, {

  • However, the above still doesn't explain why the test works for contour and istio. As Dave stated and per docs definition the TimeoutSeconds is "the maximum duration in seconds that the request instance is allowed to respond to a request". So, in my understanding it should be "respected" even if other existing timeouts have been set. So, I'm looking to understand if there is unexpected behaviour in the current implementation.

  • I think the entire file ./test/conformance/api/v1/revision_timeout_test.go could be deleted, since it is covered by the tests in ./test/e2e/timeout_test.go. I learned that e2e and conformance tests may not be interchangeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants