Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jenkinsfiles: Fix order of ginkgo tests #25002

Merged
merged 1 commit into from
Apr 21, 2023

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Apr 20, 2023

The ginkgo tests are currently executed in a different order for each run in our CI. Unfortunately, the order in which we run the tests often matters and can impact the test results. That's because there are a lot of side effects (leftover Linux state, leftover Cilium conntrack entries, leftover Kubernetes state, etc.) that are hard to control for.

This undeterminism facilities the introduction of flakes: a first pull request introduces a flake, but it doesn't show up until a later pull request runs the tests in a particular order.

The random order also makes flake a bit order to reproduce locally, though that can easily be worked around by passing the proper seed to ginkgo when reproducing.

One argument in favor of this randomness is that it surfaces those side effects and allows us to identify and clean them. In practice, that doesn't work great:

  • Being side effects, they are often hard to identify and are a huge time cost for contributors to debug. Our ever increasing list of flake issues is a good testimony of that.
  • They are not always easy to clean up [1].
  • They are often not representative of bugs likely to affect users. Our users rarely restart Cilium 50 times in the span of an hour with different configurations. In the end, our tests were not meant to uncover those side effects and they are therefore inadequate at exposing them properly. If we want to identify those side effects, then we should have tests specifically for that.

Other, non-ginkgo CI jobs run their tests in a fixed order [2] or on completely independent clusters [3].

This pull request therefore fixes the order of the ginkgo tests. To that end, a ginkgo seed was selected using $SRANDOM and will be the same for all subsequent runs.

@pchaigno pchaigno added the release-note/ci This PR makes changes to the CI. label Apr 20, 2023
The ginkgo tests are currently executed in a different order for each
run in our CI. Unfortunately, the order in which we run the tests often
matters and can impact the test results. That's because there are a lot
of side effects (leftover Linux state, leftover Cilium conntrack
entries, leftover Kubernetes state, etc.) that are hard to control for.

This undeterminism facilities the introduction of flakes: a first pull
request introduces a flake, but it doesn't show up until a later pull
request runs the tests in a particular order.

The random order also makes flake a bit order to reproduce locally,
though that can easily be worked around by passing the proper seed to
ginkgo when reproducing.

One argument in favor of this randomness is that it surfaces those side
effects and allows us to identify and clean them. In practice, that
doesn't work great:
- Being side effects, they are often hard to identify and are a huge
  time cost for contributors to debug. Our ever increasing list of flake
  issues is a good testimony of that.
- There are not always easy to clean up [1].
- There are often not representative of bugs likely to affect users. Our
  users rarely restart Cilium 50 times in the span of an hour with
  different configurations.
In the end, our tests were not meant to uncover those side effects and
they are therefore inadequate at exposing them properly. If we want to
identify those side effects, then we should have tests specifically for
that.

Other, non-ginkgo CI jobs run their tests in a fixed order [2] or on
completely independent clusters [3].

This commit therefore fixes the order of the ginkgo tests. To that end,
a ginkgo seed was selected using $SRANDOM and will be the same for all
subsequent runs.

1 - cilium#17459
2 - cilium/cilium-cli#558
3 - https://github.com/cilium/cilium/blob/main/.github/workflows/conformance-datapath.yaml
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno
Copy link
Member Author

/test-vagrant

@pchaigno pchaigno marked this pull request as ready for review April 20, 2023 18:59
@pchaigno pchaigno requested a review from a team as a code owner April 20, 2023 18:59
@pchaigno pchaigno requested a review from nebril April 20, 2023 18:59
@pchaigno pchaigno merged commit e84e99f into cilium:main Apr 21, 2023
48 checks passed
@pchaigno pchaigno deleted the ginkgo-fix-test-order branch April 21, 2023 11:28
@pchaigno pchaigno added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Apr 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 Apr 25, 2023
@pchaigno
Copy link
Member Author

Marking for backport to v1.13 as #20723 is happening there and this fixes it.

@sayboras sayboras mentioned this pull request Apr 26, 2023
7 tasks
@sayboras sayboras added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Apr 26, 2023
@sayboras sayboras added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Apr 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. release-note/ci This PR makes changes to the CI.
Projects
No open projects
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

3 participants