Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cloning git repos faster #4475

Merged
merged 17 commits into from
Dec 6, 2022
Merged

Conversation

deivid-rodriguez
Copy link
Member

@deivid-rodriguez deivid-rodriguez commented Mar 19, 2021

What was the end-user or developer problem that led to this PR?

The problem was that cloning bit git repos can be very slow, because we are cloning the full history of the repo.

What is your fix for the problem, implemented in this PR?

My fix is to use --depth 1 for remote clones to only download what's necessary.

This approach doesn't really work and needs more work. A potential fix was suggested here.

Fixes #3332.
Fixes #3378.
Fixes #3775.
Supersedes rubygems/bundler#7054.

Make sure the following tasks are checked

@deivid-rodriguez
Copy link
Member Author

deivid-rodriguez commented Mar 24, 2021

This should be mostly working now, and it does result in a significant improvement for large git repositories. For example, bundling rails edge with a cold cache:

Before

real 0m26,123s
user 0m39,029s
sys 0m4,780s

248M /home/deivid/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/cache/bundler/git/rails-fcf0202857b07db1a0f6220dae5ca99319ca0f32
290M /home/deivid/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/bundler/gems/rails-e287eed77a2a

After

real 0m6,326s
user 0m3,572s
sys 0m0,639s

9,1M /home/deivid/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/cache/bundler/git/rails-fcf0202857b07db1a0f6220dae5ca99319ca0f32
51M /home/deivid/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/bundler/gems/rails-e287eed77a2a

I did try to make this backwards compatible but I'm afraid this needs more work because this feature of git depends on both client and server versions, and also on the server having the allow-reachable-sha1-in-want capability enabled.

So we probably need to ask this to the server to decide whether we should fetch the whole repository or not. Apparently git servers have this /info/refs?service=git-upload-pack that provides this information:

$ curl https://github.com/git/git/info/refs\?service=git-upload-pack -sSL -o - | grep -a allow-reachable-sha1-in-want
00000156142430338477d9d1bb25be66267225fb58498d92 HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-g9c23973022a1

@mikeastock
Copy link

Hate being that guy but any update on this? Looks like it's pretty close?

@deivid-rodriguez
Copy link
Member Author

Hi. No updates. What's left to do is what I said in the previous message. Alternatively, instead of try to explicitly "feature detect" this capability, we could rescue errors when fetching specific revisions, and fallback to the current mode in that case.

@mitchellhenke
Copy link

I think this would be super!

I don't want to expand the scope or delay this work, but wanted to mention that a couple related options to minimize clone size are --branch and --no-tags. My understanding is --branch supports branches and tags, but not arbitrary refs.

In terms of feature detection, is there a minimum git version that bundler aims to support? I think all of the clone options mentioned are available on the versions of git currently receiving at least security patches.

Thanks for this work, it'd be great to see it land, and I'd be glad to test or help 🙂

@deivid-rodriguez
Copy link
Member Author

The problem is this not only depends on git client's version, which is very easy to check, but on git server's version and whether it has the allow-reachable-sha1-in-want capability enabled. But perhaps this is an overkill concern and most git servers have this. Maybe we can at least enable this for github.com, gitlab.com or others that we know can properly handle this.

@deivid-rodriguez deivid-rodriguez force-pushed the faster_git_clone_new_repo branch 10 times, most recently from 91e8c21 to 88868ff Compare December 2, 2022 17:34
@deivid-rodriguez deivid-rodriguez marked this pull request as ready for review December 2, 2022 17:41
@deivid-rodriguez
Copy link
Member Author

deivid-rodriguez commented Dec 2, 2022

If CI agrees, I believe this is ready.

I decided to pass on supporting git servers without the allow-reachable-sha1-in-want capability enabled. If it turns out someone needs that, we can follow up later.

I am still falling back to full clones on some edge cases, and when old client git version is used. The old client code paths are untested, but they are "duplicated" with other tested cases like like pinning to shortened sha's, so I believe it's fine.

These are updated results on my laptop for bundling rails edge with a cold cache:

On master

$ rm -rf ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/ ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler Gemfile.lock && time bundle _2.4.0.dev_ lock && du -hs ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/ && du -hs /Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler
Fetching https://github.com/rails/rails.git
Resolving dependencies...Fetching gem metadata from https://rubygems.org/.......

Writing lockfile to /Users/deivid/Code/playground/rails-edge/Gemfile.lock
bundle _2.4.0.dev_ lock  12,67s user 3,92s system 95% cpu 17,316 total
278M	/Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/
319M	/Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler

On this branch

$ rm -rf ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/ ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler Gemfile.lock && time bundle _2.4.0.dev_ lock && du -hs ~/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/ && du -hs /Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler
Fetching https://github.com/rails/rails.git
Resolving dependencies...Fetching gem metadata from https://rubygems.org/.......

Writing lockfile to /Users/deivid/Code/playground/rails-edge/Gemfile.lock
bundle _2.4.0.dev_ lock  1,25s user 0,69s system 40% cpu 4,804 total
9,5M	/Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/cache/bundler/
 49M	/Users/deivid/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/bundler

@deivid-rodriguez deivid-rodriguez force-pushed the faster_git_clone_new_repo branch 2 times, most recently from 9511891 to 03fb49b Compare December 2, 2022 18:23
For example, it the `ref:` specification was changed to something that
requires a full clone like a non fully qualified raw reference (not 40
hex characters).

Use the `--unshallow` option to `git fetch` for that.
@deivid-rodriguez deivid-rodriguez marked this pull request as ready for review December 5, 2022 18:11
@deivid-rodriguez
Copy link
Member Author

This is ready for a review now!

Copy link
Member

@indirect indirect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work getting this set up!

@deivid-rodriguez deivid-rodriguez merged commit 771c94e into master Dec 6, 2022
@deivid-rodriguez deivid-rodriguez deleted the faster_git_clone_new_repo branch December 6, 2022 10:49
@mitchellhenke
Copy link

Thank you, this is great!

@mjobin-mdsol
Copy link

I am not seeing a README update with this change? Should --depth=1 be the default ? is it already the default?

Thanks

@deivid-rodriguez
Copy link
Member Author

No README update, because there's no user facing change caused by this PR, other than Bundler being faster.

--depth=1 is indeed now the default, whenever possible.

@tisba
Copy link

tisba commented Feb 13, 2024

Sorry for commenting on this already closed PR, @deivid-rodriguez. I think there was a user-facing change. Maybe unintentional (still trying to understand it completely).

I'm trying to troubleshoot and fix an issue appraisal is having since this PR was merged, see thoughtbot/appraisal#218 (comment).

Acceptance tests in appraisal are setting up some git repositories with gems and then try to run bundler against this Gemfile:

source 'https://rubygems.org/'

gem "omelette", git: "../gems/omelette"
``

This fails, because Bundler (I think starting with 2.4.0) starts to use git clone with `--depth 1`, which requires `file://`. BUT the `file://` scheme does not support relative paths and therefore fails:

$ git clone --bare --no-hardlinks --quiet --no-tags --depth 1 --single-branch -- file://../gems/omelette /path/to/bundler/git/cache
[…]
fatal: '/gems/omelette' does not appear to be a git repository
fatal: Could not read from remote repository.


I'm looking for guidance if this is a bundler bug (`git:` with a local path should never work and raise an error) or an unintentional breaking change. I'm unsure about this, because the [docs](https://bundler.io/guides/git.html#local-git-repos) talk about using `bundle config set local.GEM_NAME /path/to/local/git/repository` for local git repos, so maybe supporting this via `git:` was never intended in the first place?

@simi
Copy link
Member

simi commented Feb 14, 2024

@tisba Is there any reason to use git and not path in your case to point to local clone?

require 'spec_helper'

describe 'Debugging Bundler Issues' do
  it 'does something' do
    build_git_gems %w(omelette)

    build_gemfile <<-Gemfile
      source 'https://rubygems.org'
      gem "omelette", path: "../gems/omelette"
    Gemfile

    run 'bundle install --local --verbose'
  end
end

Does this pass?

@tisba
Copy link

tisba commented Feb 14, 2024

@simi sure. In case you don't now, Appraisal is a testing tool where you can define multiple variants of Gemfiles to test your code against. In their acceptance tests they want to ensure that git sourced gem definition work.

Again, I'm not sure if this (cloning with a relative path reference) was ever supposed to be supported by Bundler, but it worked at one time and it broke.

If this was not supported and it won't be fixed: I'd suggest raise an error if Bundler sees a local path for the git: option, as this can never work with the current approach of invoking git with shallow cloning.

@simi
Copy link
Member

simi commented Feb 14, 2024

@tisba I'm familiar with Appraisal. Per my understanding git option was never intended to be used with local git clone, since internally it is used as a remote to clone git repository to cache folder. Local git path is still valid, since you can have bare repository clone with git clone --bare "https://github.com/simi/ABO" like command.

@tisba
Copy link

tisba commented Feb 16, 2024

Per my understanding git option was never intended to be used with local git clone

Alright. I'll take a look then if I can provide a PR to raise a proper error message. Like I said, technically it works just fine, just not with relative paths. I'll need to go through the bundler code a bit more, but maybe another option is to detect local paths and clone (slightly) differently. IMO it would be more consistent and less surprising to the user, granted that this might be a rare use case.

Local git path is still valid, since you can have bare repository clone with git clone --bare "https://github.com/simi/ABO" like command.

Yes, but that's not what bundler is using currently.

@simi
Copy link
Member

simi commented Feb 16, 2024

Yes, but that's not what bundler is using currently.

@tisba this is not about what's using bundler currently, but where do you point bundler to. Bundler uses the passed git to clone git repo to own cache. If that's not possible, it fails. Locally you can still have bare repo and bundler can clone it to own cache. If you would like to point to common local clone, use path and bundler will not clone it into own cache, but it will use it as is on filesystem.

@tisba
Copy link

tisba commented Feb 16, 2024

hmm, not sure if we have a misunderstanding here :) I'll create a new issue, doublecheck my observations and add a repro case.

@deivid-rodriguez
Copy link
Member Author

Hei. Yeah, I'm not sure giving a relative path was explicitly supported before. It seems you'd want path: instead for that. Also, I'm not sure how --depth 1 would affect this, are we sure it's this change that broke that?

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants