Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the flake regressions test suite #10603

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

edolstra
Copy link
Member

Motivation

This adds a GitHub action to run a subset of the flake regressions test suite, which is a set of 259 flakes with their expected evaluation results (which is a JSON serialization of the flake outputs, extracted using flake-schemas).

Since the full test suite takes a few hours to run, this only runs the first 25 flakes for now. We may want to have a manually triggered action to run the full test suite.

Context

Priorities and Process

Add 馃憤 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.


status=0

flakes=$(ls -d tests/*/*/* | head -n25)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it was a random selection of 25 every time? i.e. using shuf -n25 instead (or anything similar)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about that. It's probably a good idea but it also means that a failing test can go away just by rerunning the action...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but IMHO I think that's probably worth the trade-off (I'd personally look at the logs before restarting a failed test, but that's just me)? At least until they start failing very frequently for reasons other than "we actually regressed" like "script had bad assumptions" or "GitHub is having A Moment again".

I wonder if maybe it would make sense for Hydra to (try to) run the entire suite and have CI continue to run only a handful of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to introduce randomness, it'd be critical to print out what the random seed is -- and make it easy to re-run it with the exact same seed, to reproduce that failure.

Copy link
Member

@cole-h cole-h Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. shuf itself does have a --random-source= flag, where the argument is a file with random bytes, so maybe we could write out some random bytes (and then base64 encode them so they're still printable) and cat that into a file (and then stdout/stderr) before running the tests?

EDIT: Of course, I don't know if that's comparable to having the random seed, but I have to imagine it would be...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One trick I've used recently is using git commit hash as a seed for a random number generator:

https://github.com/tigerbeetle/tigerbeetle/blob/8b4a0d262a1429a90a92079dac9977649bd3e0e1/.github/workflows/linux.yml#L79

Comment on lines +177 to +186
- name: Checkout flake-regressions
uses: actions/checkout@v4
with:
repository: DeterminateSystems/flake-regressions
path: flake-regressions
- name: Checkout flake-regressions-data
uses: actions/checkout@v4
with:
repository: DeterminateSystems/flake-regressions-data
path: flake-regressions/tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this goes a step too far in DetSys trying to take control of Nix Flakes. I agree that tests are useful (even for experimental features like Flakes), but by fetching the test suite from a DetSys repo, you essentially have direct control over which changes you want to be allowed. If the Nix team needs to make a breaking change to Flakes, they should be allowed to by changing the test suite to accommodate that without jumping through hoops.

Of course, DetSys doesn't want breaking changes, because you promised users of your installer that Flakes was stable, explicitly ignoring all the work the official Nix team and community has done trying to work towards stabilisation (and just maintaining Nix in general!). You even directly confirmed that this PR is trying to solidify that third-party promise.

So my ask here is simple: Make sure that the entire Nix team has exclusive control over the test suite. Either by putting the tests into this repository itself, or by putting the tests in a repo under the NixOS org that the Nix team has admin access to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll be happy to move this repo to the NixOS org if the team wants to accept this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be triaged and discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous meeting, Eelco has brought this up in the context of measuring the impact of changes; certainly not as a way to enforce stability on an experimental feature.

It seems that @grahamc may have had a different interpretation of the intent of this PR, because the description was lacking in context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this has already revealed a bug (#10612) so the test suite will need to be regenerated once the fix is in. This isn't intended to enforce bug compatibility for flakes but rather that we don't accidentally change behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

None yet

7 participants