Run the flake regressions test suite #10603

edolstra · 2024-04-24T16:52:21Z

Motivation

This adds a GitHub action to run a subset of the flake regressions test suite, which is a set of 259 flakes with their expected evaluation results (which is a JSON serialization of the flake outputs, extracted using flake-schemas).

Since the full test suite takes a few hours to run, this only runs the first 25 flakes for now. We may want to have a manually triggered action to run the full test suite.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

cole-h · 2024-04-24T22:04:23Z

scripts/flake-regressions.sh

+
+status=0
+
+flakes=$(ls -d tests/*/*/* | head -n25)


What if it was a random selection of 25 every time? i.e. using shuf -n25 instead (or anything similar)

Yeah I thought about that. It's probably a good idea but it also means that a failing test can go away just by rerunning the action...

Good point, but IMHO I think that's probably worth the trade-off (I'd personally look at the logs before restarting a failed test, but that's just me)? At least until they start failing very frequently for reasons other than "we actually regressed" like "script had bad assumptions" or "GitHub is having A Moment again".

I wonder if maybe it would make sense for Hydra to (try to) run the entire suite and have CI continue to run only a handful of them?

If we were to introduce randomness, it'd be critical to print out what the random seed is -- and make it easy to re-run it with the exact same seed, to reproduce that failure.

That's a good idea. shuf itself does have a --random-source= flag, where the argument is a file with random bytes, so maybe we could write out some random bytes (and then base64 encode them so they're still printable) and cat that into a file (and then stdout/stderr) before running the tests?

EDIT: Of course, I don't know if that's comparable to having the random seed, but I have to imagine it would be...

One trick I've used recently is using git commit hash as a seed for a random number generator:

https://github.com/tigerbeetle/tigerbeetle/blob/8b4a0d262a1429a90a92079dac9977649bd3e0e1/.github/workflows/linux.yml#L79

infinisil · 2024-04-26T13:35:21Z

.github/workflows/ci.yml

+      - name: Checkout flake-regressions
+        uses: actions/checkout@v4
+        with:
+          repository: DeterminateSystems/flake-regressions
+          path: flake-regressions
+      - name: Checkout flake-regressions-data
+        uses: actions/checkout@v4
+        with:
+          repository: DeterminateSystems/flake-regressions-data
+          path: flake-regressions/tests


I think this goes a step too far in DetSys trying to take control of Nix Flakes. I agree that tests are useful (even for experimental features like Flakes), but by fetching the test suite from a DetSys repo, you essentially have direct control over which changes you want to be allowed. If the Nix team needs to make a breaking change to Flakes, they should be allowed to by changing the test suite to accommodate that without jumping through hoops.

Of course, DetSys doesn't want breaking changes, because you promised users of your installer that Flakes was stable, explicitly ignoring all the work the official Nix team and community has done trying to work towards stabilisation (and just maintaining Nix in general!). You even directly confirmed that this PR is trying to solidify that third-party promise.

So my ask here is simple: Make sure that the entire Nix team has exclusive control over the test suite. Either by putting the tests into this repository itself, or by putting the tests in a repo under the NixOS org that the Nix team has admin access to.

Yes, I'll be happy to move this repo to the NixOS org if the team wants to accept this PR.

Will be triaged and discussed.

In a previous meeting, Eelco has brought this up in the context of measuring the impact of changes; certainly not as a way to enforce stability on an experimental feature.

It seems that @grahamc may have had a different interpretation of the intent of this PR, because the description was lacking in context.

Indeed this has already revealed a bug (#10612) so the test suite will need to be regenerated once the fix is in. This isn't intended to enforce bug compatibility for flakes but rather that we don't accidentally change behaviour.

Run the flake-regressions test suite

931fc8e

edolstra force-pushed the flake-regressions branch from 84b89d3 to 931fc8e Compare April 24, 2024 17:48

edolstra added flakes tests labels Apr 24, 2024

cole-h reviewed Apr 24, 2024

View reviewed changes

infinisil suggested changes Apr 26, 2024

View reviewed changes

flake-regressions.sh: Make the sort order deterministic

dfa7189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run the flake regressions test suite #10603

Run the flake regressions test suite #10603

edolstra commented Apr 24, 2024

cole-h Apr 24, 2024

edolstra Apr 25, 2024

cole-h Apr 25, 2024

grahamc Apr 25, 2024

cole-h Apr 25, 2024 •

edited

matklad Apr 26, 2024

infinisil Apr 26, 2024

edolstra Apr 26, 2024

tomberek Apr 26, 2024

roberth Apr 26, 2024

edolstra Apr 26, 2024

Run the flake regressions test suite #10603

Are you sure you want to change the base?

Run the flake regressions test suite #10603

Conversation

edolstra commented Apr 24, 2024

Motivation

Context

Priorities and Process

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cole-h Apr 25, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cole-h Apr 25, 2024 •

edited