Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance for scanning trees in partial clones #5699

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

bk2204
Copy link
Member

@bk2204 bk2204 commented Apr 3, 2024

Right now, if the user is using partial clone, our call to git ls-tree against HEAD is expensive because git ls-tree needs to download each blob, which it does incrementally instead of all at once.

If we're scanning the tree from HEAD, then we can avoid the expense of doing this by running git ls-files with a pattern that matches only LFS files, which makes the operation much cheaper, since we avoid needing to download blobs for many of those objects. We can format the data such that it matches the pattern we expect for git ls-tree so that we can avoid modifying most of the calls and continue to let things function in the same way. Do so, but limit our changes to Git 2.42.0 and newer, since the objecttype argument is new in that version.

In addition, let's fix some tests which rely on an invalid assumption about how we discover and process LFS files.

In two of our test repositories in this file, we never commit the
`.gitattributes` file and instead rely on the fact that `git lfs clone`
sometimes processes all files in the working tree to determine which are
pointer files.  This is inefficient and we don't want to rely on this
behaviour, especially since it differs from that of `git clone`, so fix
our tests so that we explicitly commit the `.gitattributes` file.
Right now, if the user is using partial clone, our call to `git ls-tree`
against HEAD is expensive because `git ls-tree` needs to download each
blob, which it does incrementally instead of all at once.

If we're scanning the tree from HEAD, then we can avoid the expense of
doing this by running `git ls-files` with a pattern that matches only
LFS files, which makes the operation much cheaper, since we avoid
needing to download blobs for many of those objects.  We can format the
data such that it matches the pattern we expect for `git ls-tree` so
that we can avoid modifying most of the calls and continue to let things
function in the same way.  Do so, but limit our changes to Git 2.42.0
and newer, since the `objecttype` argument is new in that version.
@bk2204 bk2204 marked this pull request as ready for review April 16, 2024 16:18
@bk2204 bk2204 requested a review from a team as a code owner April 16, 2024 16:18
Copy link
Contributor

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you!

@bk2204 bk2204 merged commit 6fc94ca into git-lfs:main Apr 17, 2024
10 checks passed
@bk2204 bk2204 deleted the ls-files-optimization branch April 17, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants