Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

libfetchers/git: Allow Git Remote Helpers #10567

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lorenzleutgeb
Copy link
Member

@lorenzleutgeb lorenzleutgeb commented Apr 19, 2024

Motivation

Use Nix with repositories that are (only) accessible via remote helpers, which are programs that are invoked by Git "under the hood" based on URL schemes.

This opens up lots of possibilities to interoperate with other storage/transport mechanisms.

Example remote helpers (by URL scheme, alphabetically):

Context

The following two changes (both part of this PR) make use of such remote helpers possible in conjunction with Nix:

  1. Relax URL scheme filtering for the Git fetcher, such that unknown URL schemes are not rejected directly. In case there is no corresponding remote helper available, the following output is produced:

    warning: URL scheme 'git+invalid' is non-standard and requires 'git-remote-invalid'.
    git: 'remote-invalid' is not a git command. See 'git --help'.
    error: program 'git' failed with exit code 128
    
  2. Add GIT_DIR to the environment, since this is required by remote helpers.

As submitted (ee66ec3) this is a proof of concept, and works for me locally. I do understand that this feature might have to be protected by an experimental flag and docs would have to be added/changed. I am happy to do this if there's positive feedback.

Priorities and Process

Add 馃憤 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world label Apr 19, 2024
Remote helpers are programs that are invoked when Git needs to interact
with remote repositories that it does not support natively, see
<https://git-scm.com/docs/gitremote-helpers>

The following two changes make use of such remote helpers possible in
conjunction with Nix:

  1. Relax URL scheme filtering for the Git fetcher, such that unknown
     URL schemes are not rejected directly.
     In case there is no corresponding remote helper available, the
     following output is produced:

         warning: URL scheme 'git+invalid' is non-standard and requires 'git-remote-invalid'.
         git: 'remote-invalid' is not a git command. See 'git --help'.
         error: program 'git' failed with exit code 128

  2. Add `GIT_DIR` to the environment, since this is required by remote
     helpers.
@spacefrogg
Copy link

I want to raise the following question / thought: This is a quite general approach. Can it be expected that all remote helpers can be called without any additional command-line options? And if not, would they better be supplied when calling nix, i.e.,

nix ... --remote-helper-opts '--a-opt a-val' git+helper://...
# or
NIX_REMOTE_HELPER_OPTS=... nix git+helper://...

or via the URL to properly pair the options with the remote helper call (for nested fetches of flake inputs etc.)?

nix ... 'git+helper://...?opts=--a-opt%20a-val'

@lorenzleutgeb
Copy link
Member Author

lorenzleutgeb commented Apr 20, 2024

I want to raise the following question / thought: This is a quite general approach. Can it be expected that all remote helpers can be called without any additional command-line options?

Yes. Remote Helpers are invoked by Git and the Git reference documentation on Remote Helpers states:

Remote helper programs are invoked with one or (optionally) two arguments.
The first argument specifies a remote repository as in Git; it is either the name of a configured remote or a URL.
The second argument specifies a URL; it is usually of the form <transport>://<address>, but any arbitrary string is possible.

Regarding

[...] or [would they better be supplied] via the URL to properly pair the options with the remote helper call (for nested fetches of flake inputs etc.)?

nix ... 'git+helper://...?opts=--a-opt%20a-val'

This is something that Remote Helpers can leverage without Git passing arguments. They control the URL scheme, so during git clone 'git+example://x?a-opt=a-val', the remote helper git-remote-example is invoked with the argument example://x?a-opt=a-val and can interpret the URL.

However, it will be tricky to get a URL with query string past Nix, since it also uses query strings to encode fetching information. With a file called git-remote-example in $PATH that just writes its $@ to a file, I was able to verify that

$ nix ... 'git+example://test?x=y&ref=main'

results in

example://test

and conversely

$ nix ... 'git+example://test%3Fx=y?ref=main'

results in

example://test%3Fx=y

So, Remote Helpers that do this, and require syntax that is otherwise parsed by Nix, will be incompatible. One workaround is the one you mentioned, somehow prefixing arguments that should not be parsed by Nix, e.g. ?pass-x=y. One loses the ability to just copy the URLs that the protocol of the remote helper requires (and maybe add ?rev or so).

IMO the conflation of fetching arguments and URL are the issue here, but that's Pandora's box. I think that there still are some interesting use cases. I am successfully using Remote Helpers with this patch.

@fricklerhandwerk fricklerhandwerk added the feature Feature request or proposal label May 21, 2024
@fricklerhandwerk
Copy link
Contributor

I have no decision-oriented opinion on this, but a few notes:

  • There was movement to migrate away from shelling out to git and instead use libgit2. So we should at least keep on the radar if we can port such a new feature to libgit2.

  • I like that using Git through Nix becomes more transparent and natural that way.

  • This introduces reproducibility risks, since now fetching would essentially require knowing the entire Nix setup including stuff from the environment. This change would lead to builtins.fetchGit requires git聽#3533 squared. And even if accepted, it would further move apart the user experience of vanilla Nix from Nix on NixOS, since you'd have to nail down more moving parts. I think we'd rather want expressions that will always work on a sufficiently modern Nix out of the box.

  • Lately, any work on fetchers for me raises the question if they should be part of Nix at all (apart from staying backward compatible). Tvix says no. Things like npins (which uses nix-prefetch-url and builtins.fetchurl underneath) and gridlock (which does not depend on Nix at all) show that we can completely decouple obtaining sources from evaluating expressions. Decoupled fetching could be provided by the distribution of Nix -- one could argue, Nixpkgs is a Nix distribution, NixOS is a Nix distribution, even Flakes are in that sense a Nix distribution.

  • Finally, what we of course should also optimise for is making Nix useful. But there are many ways to do that, such as finding and documenting convenient and scalable usage patterns. I've seen people who don't have a single builtins.fetch* in their code and manage all remote sources through git subtree.

@lorenzleutgeb
Copy link
Member Author

lorenzleutgeb commented May 21, 2024

@fricklerhandwerk thanks a lot for thinking about this and providing your insights. Very valuable! 馃檱馃徎

I have no decision-oriented opinion on this, but a few notes:

  • There was movement to migrate away from shelling out to git and instead use libgit2. So we should at least keep on the radar if we can port such a new feature to libgit2.

Right. libgit2 does not support remote helpers. It looks like it did give some trouble:

The revert was done for a somewhat similar reason: libgit2 also does not support credential helpers. And actually, now I realize that this PR goes in the exact opposite direction of

This ties in quite directly to your point about moving fetchers out of Nix.

  • I like that using Git through Nix becomes more transparent and natural that way.

I fail to understand. Which way exactly do you mean by "that way"?

  • Lately, any work on fetchers for me raises the question if they should be part of Nix at all (apart from staying backward compatible). Tvix says no. Things like npins (which uses nix-prefetch-url and builtins.fetchurl underneath) and gridlock (which does not depend on Nix at all) show that we can completely decouple obtaining sources from evaluating expressions.

Yeah, I agree they probably shouldn't. I would love to write my own fetcher that can be reused by various Nix implementations. That requires an interface and spec for fetchers. And of course it introduces a bootstrapping issue (as always): How to fetch fetchers? Quoting from the Git docs on Remote Helpers:

Git comes with a "curl" family of remote helpers, that handle various transport protocols, such as git-remote-http, git-remote-https, git-remote-ftp and git-remote-ftps. They implement the capabilities fetch, option, and push.

But, not surprisingly, there already are fetchers in builtins that would have to be maintained for backward-compatibility anyway! With a nicer interface for "fetching fetchers", it might be feasible to remove the requirement that builtins.fetchgit supports credential helpers (this has obvious r13y issues anyway...). Instead, people that use credential helpers would plug in a fetcher that supports it. The "powerful" Git fetcher.

  • Finally, what we of course should also optimise for is making Nix useful. But there are many ways to do that, such as finding and documenting convenient and scalable usage patterns. I've seen people who don't have a single builtins.fetch* in their code and manage all remote sources through git subtree.

Yup, 馃挴. My motivation for this PR is that I wanted to have the nice UX of

$ nix run git+helper://project

where "project" is something that you cannot effectively fetch with plain Git, you need git-remote-help.

In Nix source files, or on the "repo structure" level, there's lots of room to plug in fetchers, but CLIs are much more constrained if you want something usable, i.e., that humans will ever be willing to type or copy and paste.

@lorenzleutgeb lorenzleutgeb marked this pull request as draft May 21, 2024 09:59
if (url.scheme != "git" &&
url.scheme != "git+http" &&
url.scheme != "git+https" &&
url.scheme != "git+ssh" &&
url.scheme != "git+file") return {};
url.scheme != "git+file")
warn("URL scheme '%s' is non-standard and requires 'git-remote-%s'.", url.scheme, url.scheme.substr(4, url.scheme.length() - 4));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come with a way to turn off the warning, for users who consistently use this.
I do think dependencies on remote helpers should be discouraged, because they do make expressions that use rely on them harder to use, so enabling the warning by default would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to basically flake inputs and fetchTree. We could also silence it automatically when a helper is requested by any installable on the CLI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to basically flake inputs and fetchTree. We could also silence it automatically when a helper is requested by any installable on the CLI.

This sounds like a hack for something that may as well be two separate code paths, see #10567 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still trying to figure out requirements, not describing an implementation.

The hack is of your inference. I don't think we need anything truly new architecturally.
The warning could be controlled by a global setting, that besides being a normal setting is set to true by the CLI when it encounters an explicit need for remote helpers in its installables.

Do you really mean a literal extra code path? That sounds a lot heavier than a boolean that controls a warning.

@fricklerhandwerk
Copy link
Contributor

I like that using Git through Nix becomes more transparent and natural that way.
I fail to understand. Which way exactly do you mean by "that way"?

The way proposed in this change.

My motivation for this PR is that I wanted to have the nice UX of nix run git+helper://project
where "project" is something that you cannot effectively fetch with plain Git, you need git-remote-help.

Yes, this is a totally legitimate use case to me. It also hints at that the new CLI is architecturally a different beast from everything underlying it: you can't make the CLI more convenient in the straightforward way proposed in this PR without breaking what I deem essential properties of the lower layers. For me this reinforces the idea that fetching should be part of distributions (or at least par of the porcelain) rather than the plumbing.

@roberth
Copy link
Member

roberth commented May 21, 2024

@fricklerhandwerk It is ok for lower layers to bend to the requirements of the higher layers. What is the purpose of lower layers if not serving the highest layer, users?
Of course we want to be careful not to make a mess, so I'm interested to know: what are the essential properties of the lower layers that you believe are violated?

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented May 21, 2024

In this discussion the lower layer is the Nix language, which contains built-in fetchers. IMO an essential property of Nix expressions is derivation-/store-object-level reproducibility: given an expression and some files, you should always get the same derivations/store objects, everywhere any time. (Yes, in practice there are heaps of caveats, but that's my personal ideal.) Nix expressions are a central user interface to Nix, and major consumers are Nixpkgs and NixOS.

This particular attempt at making the other user interface, the new CLI, more convenient, would degrade reproducibility of the language layer, because it adds space for a lot more moving parts. Therefore I claim these two UIs are architecturally separate, and would suggest to treat them as such. Your suggestion to add a conditional depending on who is calling, would only keep those concerns entangled for the benefit of a smaller diff, which I think would make for a long-term liability.

Going further, I also claim that fetching file system trees and computing with file system trees are also separate concerns with very different constraints, but right now they are entangled in the big "Nix language" component. For indication that this may indeed be true check the endless stream of issues and pull requests trying to make fetching behave more like curl and git clone, and how they compete with the aspiration to keep Nix expression evaluation hermetic.

An ideal architecture would produce data flow graphs like this, where the CLI offers means to control any part of the pipeline, and things like git helpers nicely find its place:

graph TD;
    fetching[remote source] --> fso[file system object] --> expression --> derivation
    CLI --> fetching & fso & expression & derivation
    git-helpers -.-> fetching

As opposed to where we currently are, where expressions encode where to get stuff from, and where plugging git helpers will add degrees of freedom that make it hard to isolate expressions (I can't even draw the CLI interactions without making it a ball of yarn):

graph TD;
    fetching[remote source] --> fso[file system object] --> expression --> derivation
    git-helpers -.-> fetching & expression
    expression --> fetching

@roberth
Copy link
Member

roberth commented May 21, 2024

@fricklerhandwerk Thank you for your elaboration.

I believe there's a lot of merit in Nix's ability to manage the fetching of expression files. For this purpose, we have to pick between

  • built-in fetchers
  • fairly benevolent import from derivation on fixed output derivations (deps will have been cached, almost always)
  • submodules and/or user scripts, neither of which scale

While I wouldn't have minded to rely on IFD/FOD fetching until the built-in fetchers were actually good, that's not the path that was chosen.

For indication that this may indeed be true check the endless stream of issues and pull requests trying to make fetching behave more like curl and git clone,

This is a problem with the process that led to Nix's reproducibility being committed to the behavior of a program, git, that wasn't properly studied, presumably because people felt familiar with it, or there wasn't sufficient review, or they were pressured into delivering. I don't know the full history, but this has caused undue reputational damage to the feature, which is very unfortunate, because it is a very good feature when executed well, specifically when it comes to performance and usability (part standardization, no reliance on derivation system which may be unknowable, and probably more; haven't memorized it all).

We still have an opportunity to fix it. It is behind a feature flag, which we can lift when it behaves well. (Which is all we can do with it, fwiw)

It seems that you're not interested in developing fetchers. Perhaps you'd prefer to delegate this?

(Also I'm not sure if it's fair to "hijack" a PR to have a radical architectural conversation. How do we feel about this?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal fetching Networking with the outside (non-Nix) world
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants