Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate extensions to git filter-process to improve performance #5578

Open
bk2204 opened this issue Nov 29, 2023 · 2 comments
Open

Evaluate extensions to git filter-process to improve performance #5578

bk2204 opened this issue Nov 29, 2023 · 2 comments

Comments

@bk2204
Copy link
Member

bk2204 commented Nov 29, 2023

We've had a large number of requests for copy-on-write by default functionality, such as (most recently) #5576. In addition, people have noticed that smudging or cleaning large files (e.g. 20 GB) requires allocating that much memory.

It would be helpful to evaluate implementing extensions to the git filter-process protocol that allow us to perform copy-on-write functionality both for smudging and cleaning by passing the absolute path to the file. That would improve performance and memory usage for large files.

Note that we can't simply use the filename passed in because that might come from git hash-object --stdin, where we'd get the path, but that wouldn't necessarily be something in the working tree.

@jochenhz
Copy link
Contributor

jochenhz commented Feb 3, 2024

Just a question, wouldn't it be a nice option to add a git lfs add command that basically pre-caches the files before the user calls git add. It would be an optional command, but quite useful for LFS optimized Git clients and scripts.
Such a command could easily fix the memory allocation issue. The smudge filter would still be an issue, though

@bk2204
Copy link
Member Author

bk2204 commented Feb 5, 2024

Pre-caching the files doesn't really help us improve performance because git add will still clean them, which will invoke Git LFS to hash them again. The only way we could avoid the performance problem is if git lfs add didn't require git add at all but directly called git update-index, but that requires that git lfs add re-implement all of git add, which I would not like to do since it's a big maintenance burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Backlog
Ideas
Development

No branches or pull requests

2 participants