Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure testgrid flakinesss didn't detect flakes that happend #17773

Open
serathius opened this issue Apr 11, 2024 · 6 comments
Open

Measure testgrid flakinesss didn't detect flakes that happend #17773

serathius opened this issue Apr 11, 2024 · 6 comments

Comments

@serathius
Copy link
Member

What would you like to be added?

https://github.com/etcd-io/etcd/actions/runs/8584759086/job/23525383783
image

image
cc @siyuanfoundation

Why is this needed?

Last run https://github.com/etcd-io/etcd/actions/runs/8584759086/job/23525383783 on April 7th, didn't detect a flake on April 4th.

@siyuanfoundation
Copy link
Contributor

The flaky detection is meant to detect tests that fails sometimes, not one-off failures.
this test fails about 2% of the time. @serathius Do you think 2% threshold is reasonable?

@serathius
Copy link
Member Author

Hmm, not sure. The 16 % flakiness on main branch seems not great https://github.com/etcd-io/etcd/actions/runs/8584643093/job/23525115058.

@siyuanfoundation
Copy link
Contributor

I think the 16 % flakiness on main branch includes all the workflows on a PR. I am seeing a lot of flakiness wrt arm64.

@jmhbnz
Copy link
Member

jmhbnz commented Apr 11, 2024

I think the 16 % flakiness on main branch includes all the workflows on a PR. I am seeing a lot of flakiness wrt arm64.

Raised a flake issue for TestMemberAdd e2e on arm64 and amd64. I have seen it fail a few times in GitHub actions for arm64 and there are also instances in TestGrid for amd64 on prow.

#17778

@serathius
Copy link
Member Author

The flaky detection is meant to detect tests that fails sometimes, not one-off failures.
this test fails about 2% of the time. @serathius Do you think 2% threshold is reasonable?

Maybe we could improve on visibility. What was surprising for me was fact that the tool didn't mention any flakes. Could we maybe log flakes below 2%, with note that it's too low to file an issue?

@serathius
Copy link
Member Author

serathius commented Apr 30, 2024

The reports are very nice.
image

My suggestions:

  • Make them easier to find, maybe use https://github.com/marketplace/actions/publish-test-report to pushish a report in summary
  • 10% per test threshold is very high so it will not report anything, from contributor perspective I don't care about a flakiness of a single test. I care about my PR having flakes wasting my time on retries. I would recommend to change the threshold to be per suite, if the whole suite flakiness is above 10%, we file an issue for the most flaky tests. This way we catch cases of tests with low flakiness not being a problem individually, but in aggregate. Like 10 tests with flakiness of 1%. We can start from reporting just the top flaky test in the suite, we can iterate on it later.

To go into more detail, lets define a measure of bad contributor experience due to CI, something like time wasted on CI to merge PR. I would call this TTM - time to merge, a reflection of how long it takes to test a PR and flakiness of those test. I would expect TTM to equal something like max(TSDi^(1+TSFi) for each i) where TSDi - duration of test suite i, TSFi - flakiness of test suite i. Because retries can be done on test suite level, we need to count it per suite. If we set a target for TTM, different suites might have different acceptable flakiness as it's easier and faster to retry 1 minute test, than 30 minute one. Of course it assumes that notice failure and retry is zero which is a simplification. However this is high level my mental model of the problem. If we include the TTR - time to retry the TTM=max(TSDi^TSFi+(TSDi+TTR)^TSFi for each i)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants