Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sparse checkout #1317

Closed
wants to merge 13 commits into from
Closed

Conversation

dfdez
Copy link
Contributor

@dfdez dfdez commented May 13, 2023

Context:

Monorepo sometimes can be challenging mainly when having a huge base of code. A lot of people have encounter situation where they don't need to fetch everything inside the repository, with this new option anyone will be able to make sparse-checkout inside github action quite simple!

Features:

  • Added sparse-checkout option
  • Added test for sparse-checkout
  • Updated README.md with sparse-checkout option and examples

Usage:

Fetch only the root files

- uses: actions/checkout@v3
  with:
    sparse-checkout: .

Fetch only the root files and .github and src folder

- uses: actions/checkout@v3
  with:
    sparse-checkout: |
      .github
      src

Once the first sparse-checkout is made you can download more folders with the command git sparse-checkout add SOME/DIR/ECTORY more details rewarding sparse-checkout here

Implementation

In order to make sparase-checkout much useful I have added to the git fetch the possibility to add a filter in order to use --filter='blob:none'.

With this update we will make sure that when the fetch is done no blob of the repository will be downloaded (this filter will be done only when sparse-checkout is set), instead the blobs will be downloaded when making the git checkout after having the sparse-checkout configured.

Here you can find the article that I have used to guide me to make the implementation.

Related Issues

@dfdez dfdez requested a review from a team as a code owner May 13, 2023 11:33
@dfdez dfdez changed the title Add support for sparse checkout option Add support for sparse checkout May 13, 2023
@blu3mania
Copy link

Same comment as I posted on #680:

Shouldn't lfs.fetchinclude also be set or use "-I" (or "--include") parameter in GitCommandManager.lfsFetch to make sure LFS only fetches from included paths?

@dfdez
Copy link
Contributor Author

dfdez commented Jun 7, 2023

Shouldn't lfs.fetchinclude also be set or use "-I" (or "--include") parameter in GitCommandManager.lfsFetch to make sure LFS only fetches from included paths?

Regarding this topic and after taking a look and making some test using "--include" option directly when making the git lfs fetch will end up in a first try error:

image

I think this happens because with sparse-checkout we have make the fetch with --filter=blob:none therefore lfs is missing information

In order to have working lfs correctly with sparse-checkout I thought in 2 possible solutions:

  1. Skip the lfs fetch step and don't set GIT_LFS_SKIP_SMUDGE=1 and fetch will be done in the actual checkout
  2. Skip the lfs fetch step and run a git lfs pull after checkout with "--include" option

Let me know what you think about this! But my personal opinion is that the second solution is the best one.

@dscho
Copy link
Contributor

dscho commented Jun 7, 2023

Skip the lfs fetch step and run a git lfs pull after checkout with "--include" option

Disclaimer: I am not using Git LFS myself. Having said that, I guess that would allow Git LFS to batch-fetch, and the first option maybe would not? If so, I am definitely in favor of the second option. Even if both allow batch-fetching, the second option looks slightly cleaner to me. Probably something like this?

diff --git a/__test__/git-auth-helper.test.ts b/__test__/git-auth-helper.test.ts
index fec6573..fa2d1db 100644
--- a/__test__/git-auth-helper.test.ts
+++ b/__test__/git-auth-helper.test.ts
@@ -759,6 +759,7 @@ async function setup(testName: string): Promise<void> {
     init: jest.fn(),
     isDetached: jest.fn(),
     lfsFetch: jest.fn(),
+    lfsPull: jest.fn(),
     lfsInstall: jest.fn(),
     log1: jest.fn(),
     remoteAdd: jest.fn(),
diff --git a/__test__/git-directory-helper.test.ts b/__test__/git-directory-helper.test.ts
index 362133f..9d9d8bd 100644
--- a/__test__/git-directory-helper.test.ts
+++ b/__test__/git-directory-helper.test.ts
@@ -474,6 +474,7 @@ async function setup(testName: string): Promise<void> {
     init: jest.fn(),
     isDetached: jest.fn(),
     lfsFetch: jest.fn(),
+    lfsPull: jest.fn(),
     lfsInstall: jest.fn(),
     log1: jest.fn(),
     remoteAdd: jest.fn(),
diff --git a/src/git-command-manager.ts b/src/git-command-manager.ts
index 4f6dc79..89b3a44 100644
--- a/src/git-command-manager.ts
+++ b/src/git-command-manager.ts
@@ -40,6 +40,7 @@ export interface IGitCommandManager {
   init(): Promise<void>
   isDetached(): Promise<boolean>
   lfsFetch(ref: string): Promise<void>
+  lfsPull(): Promise<void>
   lfsInstall(): Promise<void>
   log1(format?: string): Promise<string>
   remoteAdd(remoteName: string, remoteUrl: string): Promise<void>
@@ -328,6 +329,15 @@ class GitCommandManager {
     })
   }
 
+  async lfsPull(): Promise<void> {
+    const args = ['lfs', 'pull', '--include']
+
+    const that = this
+    await retryHelper.execute(async () => {
+      await that.execGit(args)
+    })
+  }
+
   async lfsInstall(): Promise<void> {
     await this.execGit(['lfs', 'install', '--local'])
   }
diff --git a/src/git-source-provider.ts b/src/git-source-provider.ts
index 967097d..e8e3e44 100644
--- a/src/git-source-provider.ts
+++ b/src/git-source-provider.ts
@@ -188,7 +188,8 @@ export async function getSource(settings: IGitSourceSettings): Promise<void> {
     // LFS fetch
     // Explicit lfs-fetch to avoid slow checkout (fetches one lfs object at a time).
     // Explicit lfs fetch will fetch lfs objects in parallel.
-    if (settings.lfs) {
+    // For sparse checkouts, wait until after the checkout is done.
+    if (settings.lfs && !settings.sparseCheckout) {
       core.startGroup('Fetching LFS objects')
       await git.lfsFetch(checkoutInfo.startPoint || checkoutInfo.ref)
       core.endGroup()
@@ -210,6 +211,13 @@ export async function getSource(settings: IGitSourceSettings): Promise<void> {
     await git.checkout(checkoutInfo.ref, checkoutInfo.startPoint)
     core.endGroup()
 
+    // Sparse checkout only: delayed LFS pull
+    if (settings.lfs && settings.sparseCheckout) {
+      core.startGroup('Pulling LFS objects')
+      await git.lfsPull()
+      core.endGroup()
+    }
+
     // Submodules
     if (settings.submodules) {
       // Temporarily override global config

@dscho
Copy link
Contributor

dscho commented Jun 7, 2023

Hmm. I cannot get that git lfs pull --include thing to work, essentially because of the arbitrary patterns allowed in sparse-checkout... So I am going to try option 1.

@dfdez
Copy link
Contributor Author

dfdez commented Jun 7, 2023

@dscho the "--include" pattern should work the same as the sparse-checkout syntax that shouldn't be a problem If you want I will try option 2 but this one can lead us to complex use cases, option 1 will make sure everything is fine and we won't have to worry about "--include" at all

@dscho
Copy link
Contributor

dscho commented Jun 7, 2023

the "--include" pattern should work the same as the sparse-checkout syntax

I am not sure that that's true. The manual page says that --include accepts paths, while the sparse-checkout files contain patterns.

So yes, I think we need to use option 1. I pushed my proposed solution here: 25d6c12

@dfdez
Copy link
Contributor Author

dfdez commented Jun 7, 2023

I am not sure that that's true. The manual page says that --include accepts paths, while the sparse-checkout files contain patterns.

You are right! Therefore better use option 1 🚀

So yes, I think we need to use option 1. I pushed my proposed solution here: 25d6c12

That seems to be good the only thing that you might be missing is to don't set GIT_LFS_SKIP_SMUDGE to 1 right? If we set GIT_LFS_SKIP_SMUDGE the pull of files won't be done on the checkout

@dscho
Copy link
Contributor

dscho commented Jun 7, 2023

the only thing that you might be missing is to don't set GIT_LFS_SKIP_SMUDGE to 1 right?

I'm confused... The !this.lfs check should succeed only if LFS is turned off, right? So if it is turned on, that environment variable should not be set, correct?

@dfdez
Copy link
Contributor Author

dfdez commented Jun 7, 2023

Yo are right I got confused here 😅

All good then, I don't know why I thought it was in the other way!

So for me 25d6c12

@blu3mania
Copy link

blu3mania commented Jun 7, 2023

@dfdez @dscho Are you guys sure "git lfs -I" does not support patterns? This document seems to suggest it does: https://manpages.debian.org/testing/git-lfs/git-lfs-fetch.1.en.html. I also searched git-lfs repo and found this issue: git-lfs/git-lfs#912, which clearly suggests that patterns should be supported. If possible, we should use "lfs pull" for better performance. I don't use sparse checkout in dev environment, but our Jenkins build does, so I took a look at the build job log, and could confirm that it used lfs.fetchinlcude config and "lfs pull" to do so:

12:04:26 Cloning repository git@github.com:org_redacted/repo_redacted.git
12:04:26  > git.exe init C:\Jenkins\workspace\JobName # timeout=10
12:04:27 Fetching upstream changes from git@github.com:org_redacted/repo_redacted.git
12:04:27  > git.exe --version # timeout=10
12:04:27  > git --version # 'git version 2.41.0.windows.1'
12:04:27 using GIT_SSH to set credentials GitHub account SSH key
12:04:27 Verifying host key using known hosts file
12:04:27  > git.exe fetch --no-tags --force --progress -- git@github.com:org_redacted/repo_redacted.git +refs/heads/*:refs/remotes/origin/* # timeout=120
12:55:49  > git.exe config remote.origin.url git@github.com:org_redacted/repo_redacted.git # timeout=10
12:55:49  > git.exe config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
12:55:49 Avoid second fetch
12:55:49  > git.exe rev-parse "origin/branch_redacted^{commit}" # timeout=10
12:55:49 Checking out Revision rev_redacted (origin/branch_redacted)
12:55:49 Enabling Git LFS pull
12:55:49  > git.exe config core.sparsecheckout # timeout=10
12:55:49  > git.exe config core.sparsecheckout true # timeout=10
12:55:49  > git.exe config lfs.fetchinclude /folder_redacted # timeout=10
12:55:49  > git.exe config --unset lfs.fetchexclude # timeout=10
12:55:49  > git.exe read-tree -mu HEAD # timeout=10
12:55:49  > git.exe checkout -f rev_redacted # timeout=120
12:56:31  > git.exe config --get remote.origin.url # timeout=10
12:56:31 using GIT_SSH to set credentials GitHub account SSH key
12:56:31 Verifying host key using known hosts file
12:56:31  > git.exe lfs pull origin # timeout=120

Note that somehow the log didn't show how it performed sparse checkout, but I confirmed that it used .git/info/sparse-checkout file to specify the sparse checkout path. Though, it doesn't matter since "git sparse-checkout" should be doing the same.

@dscho
Copy link
Contributor

dscho commented Jun 7, 2023

Are you guys sure "git lfs -I" does not support patterns?

The manual page talks about <paths>, not <patterns>. Which is misleading, granted, given that it says that wildcard matching as per gitignore is applied. However, there are still two rather big concerns using git lfs pull -i:

  • the sparse-checkout file contains directory names (in cone mode), which would potentially need to be translated into proper wildcard patterns by appending /*.
  • while the documentation refers to "wildcard matching as per .gitignore", the --include option uses a comma-separated list of paths, meaning that the comma is actually handled differently from .gitignore (or for that matter, sparse-checkout). Therefore, if the sparse-checkout definition contains, say, /blu3,mania/, a simple implementation using that option would not work correctly.
  • these two issues are not exactly big blockers, but demonstrate that it would be far from trivial to use the -i option.
  • long sparse-checkout definitions could run into command-line limitations.

With all of that, I actually really do not want to use this option.

we should use "lfs pull" for better performance

Are you sure about that? From what I gather reading https://git-scm.com/docs/gitattributes#_delay, I could imagine that it works as intended (i.e. using batched LFS fetching) even when using sparse checkouts.

As I mentioned earlier, I am by no means familiar with LFS. I don't have access to any meaningful repository using LFS. Maybe you do, and maybe you can test whether letting git checkout fetch the LFS objects in sparse checkouts lets them be fetched individually, or in batches?

@blu3mania
Copy link

Seems you are right. I don't have the env at home so I didn't test but after some more digging, it appears that "git lfs pull has better performance" was just an old myth. According to git-lfs/git-lfs#2594 (comment) it shouldn't be a problem for Git anymore after Git 2.15. So your 25d6c12 should be the right solution. Thanks.

@dfdez
Copy link
Contributor Author

dfdez commented Jun 8, 2023

we should use "lfs pull" for better performance

After some digging I found out this on the git-lfs docs, but as @blu3mania menthioned git-lfs/git-lfs#2594 (comment) clarify everything!

Therefore we can close the topic regarding lfs and sparse-checkout! Thanks both!

@dfdez
Copy link
Contributor Author

dfdez commented Jun 9, 2023

Merged here thanks @dscho for the help!

@dfdez dfdez closed this Jun 9, 2023
@dscho
Copy link
Contributor

dscho commented Jun 9, 2023

Merged here thanks @dscho for the help!

@dfdez this was for the most part your work, culminating in https://github.com/actions/checkout/releases/tag/v3.5.3. Thank you so much! If I was able to assist, I am glad that I could.

@dfdez
Copy link
Contributor Author

dfdez commented Jun 9, 2023

@dfdez this was for the most part your work, culminating in https://github.com/actions/checkout/releases/tag/v3.5.3.

Yeah that's true and I would have appreciated a small mention in the release, @TingluoHuang I don't know if that is possible to add it, but I would really appreciate it!

But the important thing is to have the functionality!

Thanks for everything

@dscho
Copy link
Contributor

dscho commented Jun 12, 2023

@dfdez you're absolutely correct, your contribution must be celebrated. Look at the latest revision of the release notes of https://github.com/actions/checkout/releases/tag/v3.5.3 😃

@dfdez
Copy link
Contributor Author

dfdez commented Jun 12, 2023

Thanks! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants