Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing each file means large sites sync very slowly #459

Open
FraserThompson opened this issue Aug 27, 2023 · 0 comments · May be fixed by #460
Open

Hashing each file means large sites sync very slowly #459

FraserThompson opened this issue Aug 27, 2023 · 0 comments · May be fixed by #460

Comments

@FraserThompson
Copy link
Contributor

My site is very large and contains some big files (like over 100mb). Because gatsby-plugin-s3 uses an MD5 hash of each file to determine if it's changed, this can mean very slow syncs (because hashing files is slow).

I found a similar issue in gatsby-source-filesystem which resulted in the addition of a a "fast" option which uses a slightly less robust but much faster method instead of hashing. So I'm just checking if a similar feature would be appreciated here?

I've done some experimenting and I think we can compare the size and the mtime between the local filesystem and the S3 metadata (which is how this s3 sync library does it). Less robust than hashing, but probably okay for 99% of use cases.

(As a bonus this would also resolve this issue with large files always being re-uploaded because the etag for multi-part uploads is handled differently: #59)

If this sounds like something the community would want I can throw a pull request together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant