Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep tide batch in pending state when presubmit contexts of its PRs are pending #32416

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oliver-goetz
Copy link
Contributor

Fixes #32097

With this PR tide keeps batches in pending state as long as required presubmit contexts of its member PRs are in pending state too. Before, Github branch-protection rules could prevent merging PRs from a batch (see #32097 for details).

The alternative approach listed in the issue (overwriting the pending contexts with success) was not implemented to keep tides behavior in regards of failing tests consistent.
At the moment tide filters a PR with failed required contexts out of the subpool, so batches where this PR is member of cannot be merged. If we overwrite these contexts, tide could merge PRs with failing contexts, when they fail late enough and their batch is already in the merge pool.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @oliver-goetz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/prow Issues or PRs related to prow area/prow/tide Issues or PRs related to prow's tide component sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 9, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: oliver-goetz
Once this PR has been reviewed and has the lgtm label, please assign cjwagner for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cjwagner
Copy link
Member

cjwagner commented Apr 9, 2024

If we're going to add this it needs to be configurable and should not be the default behavior.
Most (or all) Tide instances should instead be given permissions to override branch protection settings. A passing batch is always safe to merge even if there is a pending or failing test on an individual PR from the batch. Skipping or delaying a batch merge due to a status context on a single PR (either a flake or pending rerun) could significantly impact merge velocity unnecessarily.

I think the alternative proposal of overriding status contexts before a batch merges would be more appropriate if we are required or motivated to avoid having Tide override branch protection. That also seems like what was agreed upon in the issue: #32097 (comment)

@oliver-goetz
Copy link
Contributor Author

If we're going to add this it needs to be configurable and should not be the default behavior.
Most (or all) Tide instances should instead be given permissions to override branch protection settings. A passing batch is always safe to merge even if there is a pending or failing test on an individual PR from the batch.[...]

How does this work, if tide is running as an Github App?
There are rulesets (ref) which allow adding Github Apps to a bypass-list.
However, branch-protector creates branch protection rules (ref). These could be bypassed by an admin, but afaik roles can be assigned to users and teams only.

I'm fine with an configuration option. It could be the default behavior when tide is running as a Github App, because I don't see how could be unbroken there.

[...] Skipping or delaying a batch merge due to a status context on a single PR (either a flake or pending rerun) could significantly impact merge velocity unnecessarily.[...]

Isn't this how tide currently works in regards of failing contexts?
When a PR was merged and there are more approved and tested PRs left, tide will create a new batch which includes some of these PRs. In the next sync period it will create a serial retest for the PR with the highest priority. This PR might be also a member of the batch.
If a context of the serial test fails, the PR will be filtered out of the subpool. The prowjobs for the batch keep running, but tide will create another batch in the next sync period. The original batch cannot be merged unless someone hits the retest button fast enough.

I think the alternative proposal of overriding status contexts before a batch merges would be more appropriate if we are required or motivated to avoid having Tide override branch protection. That also seems like what was agreed upon in the issue: #32097 (comment)

Here it is a bit ugly that you have a context in success state in an PR, but when you check the details you might see a failed prow job. Otherwise, I'm fine with it too.

@cjwagner
Copy link
Member

cjwagner commented Apr 9, 2024

Isn't this how tide currently works in regards of failing contexts?

I didn't think so, but maybe. I guess that is separate from what you're trying to change here in any case since we're specifically concerned with pending contexts. Maybe this could be improved separately, by triggering a new batch, but still potentially respecting the results from the old one if it succeeds.

Here it is a bit ugly that you have a context in success state in an PR, but when you check the details you might see a failed prow job. Otherwise, I'm fine with it too.

I agree that would be confusing. Instead of linking to the failed prow job we should link to the successful batch result instead.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@oliver-goetz
Copy link
Contributor Author

Ok, I'll see what I can do.
When I'm done, I'll open a PR in https://github.com/kubernetes-sigs/prow.

@oliver-goetz
Copy link
Contributor Author

Here it is a bit ugly that you have a context in success state in an PR, but when you check the details you might see a failed prow job. Otherwise, I'm fine with it too.

I agree that would be confusing. Instead of linking to the failed prow job we should link to the successful batch result instead.

While investigating the option overwrite option, I found another issue if we decide to cancel the pending prow jobs. The jobs which are going to be canceled are presubmit jobs, so crier will report their status to Github.
When we cancel such a job these things will happen in Github. Similar things might happen for all reporters which are configured:

  • crier sets the context to failed.
  • crier will create/update a comment in the PR, that the test failed.

While we could wait in tide until crier updated the context before tide overwrites the context, we cannot remove the PR comment.
Thus, aborting the presubmit prow job is a bad idea in my opinion.

We could keep the pending prow jobs running, but still overwrite their contexts in tide.
In this case, tide could merge the PRs and crier will eventually update the contexts and the comments when they are already merged.
It might happen that a pending prowjob will fail and the PR is updated accordingly. In this case it will look like that tide merged an PR with failed tests.
This is also not ideal, but it should be the status quo for the installations where tide is not running as a Github App and has permissions to override branch-protection rules.

@oliver-goetz
Copy link
Contributor Author

I've just seen that some reporters (github, gerrit) do not report a prowjob when prowJob.Spec.Report == false.
Would it be a proper way to go if we set this flag to false in our case here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow/tide Issues or PRs related to prow's tide component area/prow Issues or PRs related to prow cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Branch protection rules prevent first PR of a merge-batch from being merged
3 participants