Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automated migrations #56

Open
infinisil opened this issue Apr 22, 2024 · 3 comments
Open

Implement automated migrations #56

infinisil opened this issue Apr 22, 2024 · 3 comments

Comments

@infinisil
Copy link
Member

infinisil commented Apr 22, 2024

Most packages in Nixpkgs aren't defined by pkgs/by-name yet, even if they could. While one could migrate them manually, that would take a long time. Instead we should automate this. Figuring out how to do large-scale automated migrations will also be of great benefit to many other future changes in Nixpkgs.

In this issue I'll try to explain the pre-existing work and my ideas on how to go about this in the current codebase, such that others could implement this.

The task

The pkgs/by-name directory has certain validity checks that need to be fulfilled. For the migration in particular, these are most important:

  1. Each package directory (e.g. pkgs/by-name/he/hello) must not refer to files outside itself using symlinks or Nix path expressions.
  2. Evaluate Nixpkgs with system set to x86_64-linux and check that:
    a. For each package directory, the pkgs.${name} attribute must be defined as callPackage pkgs/by-name/${shard}/${name}/package.nix args for some args.
    b. For each package directory, pkgs.lib.isDerivation pkgs.${name} must be true.

A lot of packages under pkgs.* do fulfill these requirements, these are the ones we can migrate automatically. The ones that don't fulfil these requirements typically need some manual refactoring.

After verifying these requirements, migration can happen in two separate stages:

  • Move all files needed for the package into the appropriate pkgs/by-name directory, with the entry-point at package.nix.

    Rewrite the packages definition in all-packages.nix appropriately to point to ../by-name/...

  • Optionally, if the definition in all-packages.nix has { } as the second callPackage argument, remove the definition completely.

Examples

  • pkgs.kody (defined here) can't be migrated automatically, because its default.nix contains ../../../top-level/kodi-packages.nix, therefore not fulfilling requirement 1.
  • pkgs.termdown (defined here) can't be migrated automatically, because it's not defined using pkgs.callPackage
  • pkgs.qiv (defined here) fulfills all requirements and therefore can be migrated by:
    • Moving pkgs/applications/graphics/qiv/default.nix to pkgs/by-name/qi/qiv/package.nix and adjusting the definition in all-packages.nix to
        qiv = callPackage ../by-name/qi/qiv/package.nix {
          imlib2 = imlib2Full;
        };
    • We cannot remove the definition entirely because the second callPackage argument is not { }
  • pkgs.scli (defined here) fulfills all requirements and therefore can be migrated by:
    • Moving pkgs/applications/misc/scli/default.nix to pkgs/by-name/sc/scli/package.nix and adjusting the definition in all-packages.nix to
        scli = callPackage ../by-name/sc/scli/package.nix { };
    • And because the second callPackage argument is { } we can remove the entire definition.

Ratchet checks

Currently this tool implements ratchet checks, which prevent the introduction of new instances of deprecated patterns. This happens here:

// Otherwise, the path is outside `pkgs/by-name`, which means it can be
// migrated
Loose((syntactic_call_package, relative_location_file))

This says that the ratchet for a specific package is "loose" (aka legacy), meaning we can migrate away from it (therefore tightening the ratchet).

Currently these ratchet states are only used to compare the base branch and the head of a PR, such that we can detect invalid transitions between ratchet states, see ratchet.rs. This allows detect when a PR introduces a new instance of a legacy pattern.

This notably also means that even if somebody merges a PR that introduces new instances of deprecated patterns (happens if a pattern is deprecated between the PRs CI running and it being merged), it won't break master, because these ratchets aren't invalid on their own: Only the transition from tight to loose is invalid.

Using ratchets for migrations

We can also use this ratchet abstraction to determine migrations, because the loose state is exactly what needs to be migrated!

The tool can have two modes:

  • (This one currently exists) Take two Nixpkgs instances, verify them individually, then compare the ratchets between them:
    // Both base and main branch succeed, check ratchet state
  • (This one would be new) Take a single Nixpkgs instance, verify it, then run a migration on all loose ratchets.

Thus in addition to defining how ratchets can be compared, we just need a way to define how ratchets can be migrated.

I've started a very rough WIP branch of this before here. Note how:

Problem: Which files are part of a package?

It's really tricky to figure out which files are part of a package and should therefore be moved. Because outside of pkgs/by-name, each package's entry-point file (usually default.nix) can refer to either more of its own files which should be moved (e.g. ./Cargo.lock), or refer to arbitrary files from other packages in which case it can't be moved without breaking check 1 (like ../../some/file.sh).

I did manage to get that to work almost perfectly in an older version of this tooling. It works by creating an index of all (static) references between Nix files to figure out whether a file was only references by a single package or many packages, and then revert that relation to figure out which packages only referred to files used by that package itself.

This tool was used to create this PR: NixOS/nixpkgs#211832

Notably, the current nixpkgs-check-by-name tool doesn't have such functionality, which also means that when somebody creates a PR that introduces a new package outside of pkgs/by-name, it tells them to migrate it to pkgs/by-name, even if it can't be directly migrated due to it referring to shared files! This is the reason the workaround for packages with multiple versions is necessary.

Problem: It's non-trivial to efficiently rewrite many sections of files

This is what I've been struggling with while experimenting. I wrote some notes on this. Basically it's really hard to use rowan (the parsing library used underneath rnix) to edit files in multiple places.

In this case I'd like to have a basic edit operation implemented that can modify or remove attributes (for all-packages.nix). I did get that working in the older tooling, but it's a bit too hacky (still, it shows that it can be done!).

Todo

This is my current idea on how to progress with this:

  1. Port the reference index code from the old tool to nixpkgs-check-by-name, therefore removing the need for the multi-versioned package workaround. This means that we only have a loose ratchet exactly when we can figure out how to migrate automatically.

  2. Implement decent abstractions for representing tree changes, including moving files and changing Nix files. And implement those abstractions efficiently using rnix.

  3. Make ratchets declare how to migrate from a loose state to a tight one using the above tree change abstraction. This should be almost trivial at this point, since we already know that we can migrate, and information from determining this should allow exactly specifying how we can migrate using the abstraction for doing changes.

  4. Set up CI to run the migration for every push to Nixpkgs master, and force push the resulting changes into a separate branch, which can then be merged at any point.

    Bonus: Also make it open a tracking issue for all the packages that can't be migrated manually and update it as they are fixed.

Every future addition of a new ratchet check (for any pattern we want to deprecate) will have to come with an implementation of how to migrate from it, which will then automatically run and be pushed to the branch, which can then be merged again as necessary.

@willbush
Copy link
Member

willbush commented Apr 23, 2024

This might be a good tool for this https://github.com/getgrit/gritql well it doesn't support nix yet. I wonder how hard it is to add given it's based on tree-sitter grammars.

Adding Nix support to that tool could create a lot of value for Nix.

@philiptaron
Copy link
Contributor

I tried to add Nix support to sg and didn't get too far, likely because I didn't really understand why simple pattern matching wasn't working. Agree that a tree-sitter / codemod approach is the right way to go. tree-sitter-nix exists and I think could be resuscitated for this usecase.

@morgante
Copy link

In case anyone ends up working on this, I'd be very receptive to adding Nix support in GritQL and happy to debug any patterns that don't match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants