Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI with bors is slow, still flaky, and provides no way to manually override #11449

Open
Frassle opened this issue Nov 23, 2022 · 5 comments
Open
Labels
area/build CI/CD for pulumi/pulumi kind/engineering Work that is not visible to an external user

Comments

@Frassle
Copy link
Member

Frassle commented Nov 23, 2022

Pretty much the title.

We're finding that bors has not really improved our issues with CI, it's at best moved them around, and more likely made them worse.

For reasons unrelated to bors our CI actions are flaky and often timeout or crash the github runners. With bors that requires a complete retry of the bors action, while previously we could use githubs "retry failing jobs" feature to just retry the flaky parts.

Bors itself seems to be slow, I've seen PRs sit for hours after a "bors merge" comment with no action taken by bors.

Bors also obstructs the option to manually push things through. For example if we have a P0 bug in a release with a trivial fix we might want to push that fix and release faster accepting the risk of not running all the tests against the benefit of getting a P0 bug fix in customer hands sooner.

Comments like the above have come up repeatedly over the last few months since switching to bors, and we we're generally happy to accept that there would be some teething pain with the switch. However there hasn't been any concrete proposals put forward to fix these issues.

As such this issue is a commitment to either get concrete solutions put in place for the above, that is:

  • Fixing the need for so many retries, or improving the retry experience to not cost a whole run
  • Speeding up the action of bors merge to less than an hour of wait time
  • Providing a way to override the merge queue on a per-PR basis

Failing fixes for the above we should move away from using bors, and return to the old system of PR tests and master tests with retries. We will still want to investigate flakyness at some point in that system, but it's a lot less pressing and doesn't have the other negatives of bors. It also makes our KPI of green master meaningful again, giving us back a metric we've lost to see how stability is going.

@Frassle Frassle added the area/build CI/CD for pulumi/pulumi label Nov 23, 2022
@Frassle
Copy link
Member Author

Frassle commented Nov 23, 2022

Oh also it's littering our release page with tens of draft releases that aren't meaningful.

@dixler
Copy link
Contributor

dixler commented Nov 23, 2022

Some community PRs are currently in a weird state where bors won't merge them
They're tracked in this issue: #11423

@justinvp justinvp added the kind/engineering Work that is not visible to an external user label Jul 18, 2023
@justinvp
Copy link
Member

Given #13501, do we still need to keep this open?

@Frassle
Copy link
Member Author

Frassle commented Jul 18, 2023

Probably not, especially if GitHub merge queues allow partial retries of failed jobs which was the main complaint for this issue.

@blampe
Copy link
Contributor

blampe commented Aug 9, 2023

Bors also obstructs the option to manually push things through. For example if we have a P0 bug in a release with a trivial fix we might want to push that fix and release faster accepting the risk of not running all the tests against the benefit of getting a P0 bug fix in customer hands sooner.

You can jump the queue but I don't think it will let you skip tests (since that's the whole point of the queue). Instead, for P0 situations you could disable the merge queue requirement from the UI -- maybe not ideal, but very easy in a pinch.

especially if GitHub merge queues allow partial retries of failed jobs which was the main complaint for this issue.

I think this is addressed by the only merge non-failing pull requests option, but I haven't used it personally.

Leaving this checkbox unselected can be useful if you have intermittent test failures, but don't want false negatives to hold up the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build CI/CD for pulumi/pulumi kind/engineering Work that is not visible to an external user
Projects
None yet
Development

No branches or pull requests

4 participants