Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy fetch only needed files + pin git repo to commit hash #378

Open
milahu opened this issue Oct 12, 2021 · 1 comment
Open

lazy fetch only needed files + pin git repo to commit hash #378

milahu opened this issue Oct 12, 2021 · 1 comment

Comments

@milahu
Copy link

milahu commented Oct 12, 2021

i need to mount the git repo/tree at a specific commit, for better stability

but, most git servers (including github) have
uploadpack.allowReachableSHA1InWant = false (default value)
so fetch-by-commit requires a nontrivial workaround ...

any plans to support this?

PRs welcome? ; )

related: https://github.com/NixOS/nixpkgs/pull/135545/files

edit: just noticed, on mount, gitfs does a full shallow clone ...
but i need something much more lightweight, which downloads files only on demand.
use case: mount https://github.com/NixOS/nixpkgs = 250 MBytes extracted, of which i only need a few KBytes

related

@milahu milahu changed the title pin git repo to commit hash lazy fetch only needed files + pin git repo to commit hash Oct 14, 2021
@milahu
Copy link
Author

milahu commented Oct 16, 2021

lazy fetch only needed files

this is solved in google's slothfs (aka gitfs), but only works with "Gerrit/Gitiles based Git hosting" = no github

background: access monorepos from thin clients

the tree contains a significant amount of unused data

no github

on github, we can use api.github.com to browse commits and trees
and use raw.githubusercontent.com to fetch blobs

this works well for small projects, where only few files are actually needed (syscalls: read, readdir, stat, attr)
for larger projects, this can be too slow, so git clone becomes more attractive

pin git repo to commit hash

the goal here is file-deduplication on fat clients.
fat client = the full git repo was fetched with some depth (mostly-shallow clone),
so we have local access to old versions

what i want is something like this:

gitfs interface

/commit/:sha/tree/ -> root directory by commit sha
/tree/:sha/ -> root directory by tree sha (can be a subfolder of the repo)
/blob/:sha -> file by blob sha

sample:

$ gitfs mount https://github.com/presslabs/gitfs.git /tmp/demo
$ cd /tmp/demo
$ less commit/12886ec5b9c7e103bfcab0cd37a8333873382fae/tree/README.md

this would show exactly this file:

https://github.com/presslabs/gitfs/blob/12886ec5b9c7e103bfcab0cd37a8333873382fae/README.md
https://raw.githubusercontent.com/presslabs/gitfs/12886ec5b9c7e103bfcab0cd37a8333873382fae/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant