-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.15] OCPBUGS-28928: Upgrade Validation plugin for SHA1 certs #585
base: release-4.15
Are you sure you want to change the base?
[release-4.15] OCPBUGS-28928: Upgrade Validation plugin for SHA1 certs #585
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-28928, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Odd failures that I don't see locally |
b2150de
to
2cf6fdd
Compare
@gcs278: This pull request references Jira Issue OCPBUGS-28928, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@gcs278: This pull request references Jira Issue OCPBUGS-28928, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
2cf6fdd
to
a188598
Compare
Avoid scanning validation_test.go, plugin_test.go, and router_test.go since they always contain fake/example certificates.
SHA1 certs will be unsupported in the future. Clean up unit tests to organize clearly between SHA1 and SHA256 certs. The unit tests had SHA1 certs that weren't tested separately. Also, the unit tests claimed to be using an expired certificate, but it was NOT expired. Document all certs by adding openssl comands for the regenerated certs to help future developers recreate certs if needed.
Bump openshift/api to get new route UnservableInFutureVersions status condition type. go get github.com/openshift/api@93d6bda14341f1d18319774367829c3278c171e7 go mod tidy go mod vendor
When an route ingress already exists, but an expected condition isn't found, recordIngressCondition would previously not change/update anything. This wasn't an issue for when only had Admitted because the route ingress was updated synchronously with the Admitted condition. However, adding another route status condition exposes the scenario in which a route ingress will exist, but the desired condition won't. This also adds a unit test for recordIngressCondition which explicitly tests this case along with other edge cases.
Previously, recordIngressCondition sometimes clears the original return value conditions depending on whether the condition was found and changed. This doesn't seem consistent and makes it hard to unit test. The original return value only affects logging. Logging the original conditions is advantageous because modifying conditions can also lead to contention, and having visibility into the modified conditions is beneficial. Therefore, it seems reasonable to remove condition clearing as it benefits logging and it's inconsistent. Update unit testing to now verify original return value from recordIngressCondition.
Add the Upgrade Validation plugin which provides a framework for adding a status to a route to indicate future upgrade issues. For this initial implementation, if a route has a SHA1 certificate, the upgrade validation plugin will mark the route with a UnservableInFutureVersions status. If the route no longer has a SHA1 certificate, remove the status instead of setting UnservableInFutureVersions=false. This is specific to OpenShift 4.15 and is expected to be removed from 4.16 once backported to avoid confusion. It's specific to 4.15 because 4.15 accepts SHA1 certs, but 4.16 does not. Therefore, validating the upgrade should be only done in 4.15. Update the RejectionRecorder interface to be the RouteStatusRecorder to support more generic status-related use cases as we have more than just Admitted status now. Add UPGRADE_VALIDATION env variable, but default to true, so that we can disable in the future. Follow existing pattern similar to existing EXTENDED_VALIDATION env variable. Key the writerlease logic off of route UID AND condition type. This is now essential because if only route UID, then the lease logic will only update one condition as it doesn't differentiate from the work items for the different conditions. Adjust findMostRecentIngress to use the latest condition instead of just Admitted as we can no longer assume that Admitted is the only condition. Add a type for writelease WorkKey to ensure strings aren't used anymore as the previous logic was only using route.UID. Add a type for ContentionTracker key to ensure it is different than writelease WorkKey. Update ingressEqual function to detect contention if any condition is not equal. Previously, it hardcoded the "Admitted" status. Add a unit test, TestRouterContentionUpdateCondition to test these changes.
Add --debug-upgrade-validation-force-add-condition and --debug-upgrade-validation-force-remove-condition as arguments to the openshift-router. These force add or force remove the UnservableInFutureVersions condition to allow for an E2E test to do contention testing on a condition-level. For debugging and testing purposes only.
Add missing key to log statement and log when a worker becomes elected to leader or demoted to a follower for better clarity in router logs.
Introduces verbose (level 10) logging to trace plugin chain execution. A bug was identified where the plugin chain halted unexpectedly, and logging the plugin chain steps aids in debugging.
Add verbose logging to help indicate the router is attempting to update a route status. This is helpful for debugging racy router status logic.
We found that the router's contention tracker was slower after updates were made to support the UnservableInFutureVersions condition. This lag sometimes causes routers to update the route status before the contention tracker detected an update, leading to the per-route contention logic not activating. In such scenarios, the maxContentions logic eventually activates, temporarily preventing the router from making any updates to the routes. This commit improves the efficiency of the ingressConditionsEqual function to help mitigate this type of issues.
Both performIngressConditionUpdate and performIngressConditionRemoval functions add tasks to the writerlease queue even if no work needed to be done. This commit optimizes the Upgrade Validation plugin by ensuring that tasks for updating UnservableInFutureVersions are only added to the queue when changes are required. This reduction in unnecessary work significantly lowers the frequency of lease extensions. Previously, excessive lease extensions could delay route status updates under certain conditions, such as during temporary contention periods where a router pod gets demoted to a follower. After the contention is resolved, the pod’s subsequent retry will be delayed more than necessary. Additionally, this update provides a clearer indication of when updates are actually required, but have been completed by another router pod. The prior logic did not clearly distinguish between unnecessary updates and updates that were completed by another pod. The selective updates are only added to the interface used by Upgrade Validation plugin to avoid perturbing existing Admitted condition logic. This means nominal clusters without SHA1 routes should not be impacted by the Upgrade Validation plugin, minimizing the risk associated with its introduction.
a188598
to
5afca9e
Compare
I did my own testing with this PR in cluster bot with the backport of the origin test openshift/origin#28820, just as a sanity check:
|
@gcs278: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/assign @frobware |
/assign @frobware |
Manual cherry-pick of #555, #575, #588, #587 (in that order). All required to support new Upgrade Validation plugin.
This PR adds support for setting a new status condition type, UnservableInFutureVersions, when a SHA1 cert is detected on a route. It accomplishes this by adding a new plugin, the upgrade validation plugin. This plugin is simple: when a route configuration that is unsupported in the future is detected, it adds a UnservableInFutureVersions status condition to the route. When the problematic configuration is no longer detected, it clears/removes the UnservableInFutureVersions condition.
In order to support updating a new route status condition, the routeRejectionRecorder and associated methods were refactored to be more generic. The writerlease logic used to update route status had to be refactored because the old code made assumptions that only one type of status condition would ever be updated.
It also updates certificates for existing unit tests, as some certificates were SHA1 and incorrectly labeled as expired.
Depends on openshift/api#1751
Related Ingress Operator PR openshift/cluster-ingress-operator#1014
DELTA FROM #555:
HOW I CREATED THIS PR: