Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply insteadOf rewriting rules to Git LFS object URLs by default and without converting to API endpoints #5547

Open
chrisd8088 opened this issue Oct 17, 2023 · 2 comments

Comments

@chrisd8088
Copy link
Contributor

PR #3590 introduced the lfs.transfer.enableHrefRewrite setting which, when set to true rather than its default of false, causes Git LFS to apply url.<base>.insteadOf and url.<base>.pushInsteadOf settings to the URLs of individual Git LFS objects.

There was some discussion in PR #3590 as to whether this should become the default with the next major release of Git LFS, when a pure SSH transfer protocol would be available. The concern noted at the time was that with a configuration such as the following, Git LFS would attempt to use SSH to download and upload Git LFS objects, when only HTTPS was supported:

[url "ssh://git@github.com/"]
	insteadOf = "https://github.com/"

In practice, though, this is not what happens, because the implementation from PR #3590 calls the NewEndpoint() method of the EndpointFinder interface in the lfsapi package, which performs the appropriate rewriting, but then converts any URL with a scheme other than file:// or http[s]:// into an HTTPS URL. This conversion effectively reconstructs the https:// URL, but only after removing any query arguments. For some Git LFS hosting services like GitHub, query arguments are required for successful Git LFS object URL requests, and so these requests would then fail with those services.

In effect, when lfs.transfer.enableHrefRewrite is true, Git LFS applies the same logic to Git LFS object URLs as it normally uses when attempting to guess the appropriate server API URL from the URL of the current branch's remote (or whatever remote is specified with configuration settings such as remote.lfsDefault) when no explicit server API is defined with lfs.url or remote.<name>.lfsurl. This is likely not the intent of the implementation in PR #3590, but it is the current behaviour.

One consequence, as described above, is that it is actually not possible at present to rewrite Git LFS object URLs from HTTPS to SSH, even with a url.<base>.insteadOf rule like the one above.

It is also not possible to rewrite Git LFS object URLs to any protocol other than http:// or https://, such as file://, and use them successfully. The rewrite succeeds, as such, because when the server API URL is found to be a file:// URL, we allow it without change, unlike when the rewrite results in an ssh:// URL. However, a missing protocol error is then generated because the scheme is neither http:// nor https://. This is likely reasonable, as we generally expect transfer adapters to use a consistent protocol for both their API queries and individual object transfers.

(Also, as reported in #5495, the existing implementation only affects the URL used to download or upload objects, and not the URL used to verify successful uploads.)

We should consider whether we can change the implementation to only perform the necessary rewriting on Git LFS object URLs (and no other transformations as if trying to determine an API endpoint) as a bug fix in a v3.x release, or if we need to wait for the next major release to do so.

And we should review whether we can make this rewriting the default in the next major release, as was proposed in #3590 (review). However, we should only do this when we are confident in the implementation.

That includes consideration of how Git LFS should behave if it is instructed to use different protocols for its batch requests and individual object URL requests.

At the moment, at least for the common case of transfers over HTTP(S), the basic download and upload transfer adapters always use the newHTTPRequest() method of the adapterBase structure, and this is what both performs the rewriting (and attempts conversion as if the determining an API endpoint) and then checks that the result contains an HTTP(S) protocol. So only HTTP(S) requests are expected or allowed.

And for the pure SSH transfer adapter, individual objects are expected to be transferred using SSH as well as batch requests, and so only the ssh adapter is defined in the internal batch response.

Similarly, when the remote server API has a file:// URL, which by default causes the standalone file custom transfer agent to be used, that implicitly means batches of objects are collected and passed to the agent without a batch query, and the agent maps their paths to map to file paths within the local Git repository referenced by the initial file:// URL. So no rewriting of individual object URLs occurs at all in this case.

Do we, therefore, want to simply disallow the rewriting of per-object URLs to use different protocols? What would it mean, for instance, to rewrite an HTTPS URL into an SSH one (assuming we fix the problem where we try to convert it back into an HTTPS URL to use as an API endpoint); should we then switch over somehow and try to use the SSH transfer adapter for just that one object?

Finally, our test suite appears to only validate the lfs.transfer.enableHrefRewrite configuration setting using a pair of negative checks which show object URLs have been rewritten to invalid ones. It would be desirable to have a more complete set of tests which validate successful downloads and uploads after URLs are rewritten, and which also confirm the expected behaviour when URLs are rewritten to different protocols (whether that is permitted or not).

@GigabyteProductions
Copy link

Is this issue responsible for the behavior demonstrated below?

kmarek@kmarek-dev-android-el8 /media/work/lfs-test/testrepo3.git
$ grep -B1 enablehrefrewrite ~/.gitconfig
[lfs.transfer]
	enablehrefrewrite = true

kmarek@kmarek-dev-android-el8 /media/work/lfs-test/testrepo3.git
$ grep -B1 https://review.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git ~/.gitconfig
[url "file:///media/mirrors/git-mirrors/by-path/https%253A%252F%252Freview.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git"]
	insteadOf = https://review.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git

kmarek@kmarek-dev-android-el8 /media/work/lfs-test/testrepo3.git
$ git lfs fetch --all https://review.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git
fetch: 36 objects found, done.                                                                                                                                  
fetch: Fetching all references...
EOFnloading LFS objects: 100% (36/36), 9.5 GB | 0 B/s                                                                                                           
error: failed to fetch some objects from 'file:///media/mirrors/git-mirrors/by-path/https%253A%252F%252Freview.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git'

kmarek@kmarek-dev-android-el8 /media/work/lfs-test/testrepo3.git
$ git lfs fetch --all file:///media/mirrors/git-mirrors/by-path/https%253A%252F%252Freview.lineageos.org/LineageOS/android_external_chromium-webview_prebuilt_x86_64.git
fetch: 36 objects found, done.                                                                                                                                  
fetch: Fetching all references...
Downloading LFS objects: 100% (36/36), 0 B | 0 B/s                                                                                                              

I'd really like to utilize lfs.transfer.enablehrefrewrite=true to fetch LFS objects from my local file:// mirror

@chrisd8088
Copy link
Contributor Author

@GigabyteProductions wrote:

Is this issue responsible for the behavior demonstrated below?

Perhaps, but I think you'd want to check by adding GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 and looking at the individual Git LFS object URLs as well as what's returned from the Git LFS API requests. It's possible to use the "standalone transfer" adapter to pull Git LFS objects from a local filesystem, but using lfs.transfer.enableHrefRewrite won't be enough to make that work.

Rather than add comments to this issue, would you mind opening a fresh issue, and including the output of git lfs env and git config -l (with any private information redacted), and the output of a GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 git lfs fetch command, so we can review what your API is returning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Backlog
Enhancements
Development

No branches or pull requests

2 participants