Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.12] OCPBUGS-33432: Route 'haproxy.router.openshift.io/timeout' value is not validated #593

Open
wants to merge 1 commit into
base: release-4.12
Choose a base branch
from

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #591

/assign candita

* OCPBUGS-6958: Fix clipHAProxyTimeoutValue so that:

* a value larger than time.ParseDuration can handle is clipped to the HAProxy max timeout
* a value that cannot be properly parsed for other reasons is set to empty instead of being silently allowed

To check that time.ParseDuration is experiencing overflow, add a new ParseHAProxyDuration to the util
package so we can evaluate the errors returned by time.ParseDuration without sacrificing its authority
in parsing time strings.

* add haproxytime

* use pkg/router/template/util/haproxytime

* drop existing ParseHAProxyDuration

* OCPBUGS-6958: Fix clipHAProxyTimeoutValue so that:

* a value larger than time.ParseDuration can handle is clipped to the HAProxy max timeout
* a value that cannot be properly parsed for other reasons is set to empty instead of being silently allowed

To check that time.ParseDuration is experiencing overflow, add a new ParseHAProxyDuration to the util
package so we can evaluate the errors returned by time.ParseDuration without sacrificing its authority
in parsing time strings.

* OCPBUGS-6958: Fix clipHAProxyTimeoutValue so that:

- a value larger than time.ParseDuration can handle is clipped to the HAProxy max timeout
- a value that cannot be properly parsed for other reasons is set to empty instead of being silently allowed

To check that time.ParseDuration is experiencing overflow, add a new ParseDuration so we
can evaluate the errors that wouldn't have been explicitly returned by time.ParseDuration,
e.g. invalid HAProxy time format syntax and integer overflows.

Add more unit tests.

* OCPBUGS-6958: Fix clipHAProxyTimeoutValue so that:

- a value larger than time.ParseDuration can handle is clipped to the HAProxy max timeout
- a value that cannot be properly parsed for other reasons is set to empty instead of being silently allowed

To check that time.ParseDuration is experiencing overflow, add a new ParseDuration so we
can evaluate the errors that wouldn't have been explicitly returned by time.ParseDuration,
e.g. invalid HAProxy time format syntax and integer overflows.

Add more unit tests.

---------

Co-authored-by: Andrew McDermott <amcdermo@redhat.com>
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-33280 has been cloned as Jira Issue OCPBUGS-33432. Will retitle bug to link to clone.
/retitle [release-4.12] OCPBUGS-33432: Route 'haproxy.router.openshift.io/timeout' value is not validated

In response to this:

This is an automated cherry-pick of #591

/assign candita

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title [release-4.12] OCPBUGS-33280: Route 'haproxy.router.openshift.io/timeout' value is not validated [release-4.12] OCPBUGS-33432: Route 'haproxy.router.openshift.io/timeout' value is not validated May 9, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 9, 2024
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-33432, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required"
  • expected dependent Jira Issue OCPBUGS-33280 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #591

/assign candita

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from knobunc and Miciah May 9, 2024 02:11
@frobware
Copy link
Contributor

frobware commented May 9, 2024

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2024
Copy link
Contributor

openshift-ci bot commented May 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 9, 2024
@candita
Copy link
Contributor

candita commented May 9, 2024

May 9 05:57:01.963: INFO: Unexpected error: during upgrade to registry.build02.ci.openshift.org/ci-op-6vx4qpjs/release@sha256:7bec38bca7cc2597ebd15693154b7fa961e13e984149291739f838992b4b65a5:
<*errors.errorString | 0xc001a2ec40>: {
s: "Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator kube-scheduler is degraded",
}

/test e2e-upgrade

@candita
Copy link
Contributor

candita commented May 9, 2024

Terraform installation error:

level=error msg=2024-05-09T02:24:02.558Z [INFO] Failed to read plugin lock file /tmp/installer/terraform/.terraform/plugins/linux_amd64/lock.json: open /tmp/installer/terraform/.terraform/plugins/linux_amd64/lock.json: no such file or directory

/test e2e-agnostic

@lihongan
Copy link
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@lihongan: This pull request references Jira Issue OCPBUGS-33432, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required"

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@lihongan
Copy link
Contributor

/test e2e-upgrade

@lihongan
Copy link
Contributor

/label qe-approved

verified with pre-merge testing

$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.ci.test-2024-05-10-020549-ci-ln-xfdhhlt-latest   True        False         16m     Cluster version is 4.12.0-0.ci.test-2024-05-10-020549-ci-ln-xfdhhlt-latest

### create pod,svc,route and annotate with various value
### review the router logs

I0510 02:50:33.145345       1 template_helper.go:353] template "msg"="route annotation timeout exceeds maximum allowable by HAProxy, clipping to 2147483647ms" "input"="28d"

I0510 02:54:19.769345       1 template_helper.go:340] template "msg"="route annotation timeout exceeds maximum allowable format, clipping to 2147483647ms" "input"="18446744073709551615d"

I0510 02:54:57.557356       1 template_helper.go:340] template "msg"="route annotation timeout exceeds maximum allowable format, clipping to 2147483647ms" "input"="106752d"

I0510 02:55:40.360908       1 template_helper.go:340] template "msg"="route annotation timeout exceeds maximum allowable format, clipping to 2147483647ms" "input"="100000000000s"

sh-4.4$ grep myedge haproxy.config -A5
backend be_edge_http:hongli:myedge
  mode http
  option redispatch
  option forwardfor
  balance random
  timeout server  2147483647ms

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 10, 2024
@lihongan
Copy link
Contributor

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label May 10, 2024
@frobware
Copy link
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@frobware: This pull request references Jira Issue OCPBUGS-33432, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required"

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@candita
Copy link
Contributor

candita commented May 13, 2024

@Miciah's assessment of backport risk from the backport to 4.14 in #568:

The change is scoped to the clipHAProxyTimeoutValue template helper function. This function is somewhat performance-sensitive and security-critical. However, it already has good unit test coverage for valid and invalid values, and this change increases test coverage with additional syntactically invalid and out-of-range test values; and the code complexity is not increased by this change (and pre-compiling time.ParseDuration(haproxyMaxTimeout) should improve performance). I believe the risk is acceptable.

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label May 13, 2024
@candita
Copy link
Contributor

candita commented May 13, 2024

It's within 7s of the maxAllowed:

fail [github.com/openshift/origin/test/extended/util/disruption/backend_sampler_tester.go:185]: May 10 04:06:32.570: disruption/service-load-balancer-with-pdb connection/new was unreachable during disruption: for at least 18s of 1h5m38s (maxAllowed=11s):

/jira refresh
/test e2e-upgrade

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 13, 2024
@openshift-ci-robot
Copy link
Contributor

@candita: This pull request references Jira Issue OCPBUGS-33432, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.z) matches configured target version for branch (4.12.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-33280 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-33280 targets the "4.13.z" version, which is one of the valid target versions: 4.13.0, 4.13.z
  • bug has dependents

Requesting review from QA contact:
/cc @lihongan

In response to this:

It's within 7s of the maxAllowed:

fail [github.com/openshift/origin/test/extended/util/disruption/backend_sampler_tester.go:185]: May 10 04:06:32.570: disruption/service-load-balancer-with-pdb connection/new was unreachable during disruption: for at least 18s of 1h5m38s (maxAllowed=11s):

/jira refresh
/test e2e-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from lihongan May 13, 2024 19:39
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d998087 and 2 for PR HEAD 8b5fa79 in total

@lihongan
Copy link
Contributor

/test e2e-upgrade

@candita
Copy link
Contributor

candita commented May 14, 2024

Seems to be a perma-failure with ovnkube. For example:

event happened 118 times, something is wrong: ns/openshift-ovn-kubernetes pod/ovnkube-node-4xlds node/ci-op-8y6s2l65-ba64e-cdprv-worker-westus-sg86g - reason/Unhealthy (combined from similar events): Readiness probe errored: rpc error: code = Unknown desc = command error: time="2024-05-14T00:59:55Z" level=error msg="exec failed: unable to start container process: error adding pid 26192 to cgroups: failed to write 26192: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9a423f89_3927_4d7d_b333_947b916892ed.slice/crio-8a469cb8bc30b08e825d63b33b913e9ed5bf3859a14dfc0568ecb477a302ef67.scope/cgroup.procs: no such file or directory", stdout: , stderr: , exit code -1

@candita
Copy link
Contributor

candita commented May 14, 2024

/test e2e-upgrade

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 0c80edf and 1 for PR HEAD 8b5fa79 in total

@lihongan
Copy link
Contributor

/retest-required

Copy link
Contributor

openshift-ci bot commented May 28, 2024

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-upgrade 8b5fa79 link true /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants