Skip to content

Sparse checkouts with git

Sunil Pai edited this page Aug 6, 2020 · 1 revision

tl;dr -

  • make sure git --version returns 2.27.0 or higher.
  • git clone --filter=blob:none --sparse <repo> --depth=1
  • git sparse-checkout set <path> <path> <...path>

So. You've just joined a new product team, and you got a fresh laptop, and you're reading to write some code. You head to the github/bitbucket/internal git hosting page, and notice the codebase is HUGE. There could be many reasons for this -

  • It could be a so called 'monorepo', hosting many applications and dependencies, being worked on by many teams concurrently.
  • It could have a long history, possibly spanning decades, and thousands and thousands of commits.
  • It could be holding a number of large files, like movies or large .psd/.ai/etc asset files, heavy on graphics/audio/video.
  • [more?]

Now you could run git clone <path> and head off for a couple of hours to get introduced to office gossip and terrible coffee, but you're smarter than that. If only there was a way to:

  • checkout just a slice of the codebase, with only the folders you're interested in.
  • only the latest code, since you're not interested in having the past history of the codebase on your local machine

Drum roll... This is totally doable! The git feature is called a 'sparse checkout'; it was introduced in January this year (2020). This github post goes into some detail and is a recommended read.

(inb4; mercurial fans will love to point out that sparse profiles have been a thing with hg for a long time now, but I'd like to remind them that svn had it for years before that, so phbbt.)

NB: It's worth noting the disclaimer on this page, "THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE." Keep an eye out for any changes in synatx/commands, and we'll make a note to keep this article up to date as we learn of any changes.

TODO:

  • Some approximation of sparse 'profiles', so devs don't have to write folder names themselves, they could read it from a plain text file. Different teams could have separate text files with folder lists in them.
  • Some kind of helper to verify whether any missing dependent folders haven't been checked out, either via calculating a dependency graph, or something else.