lang/funcs: File hashing functions stream data from disk #28681
Merged
+18
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously our file hashing functions were backed by the same "read file into memory" function we use for situations like
file
andtemplatefile
, meaning that they'd read the entire file into memory first and then calculate the hash from that buffer.All of the hash implementations we use here can calculate hashes from a sequence of smaller buffer writes though, so there's no actual need for us to create a file-sized temporary buffer here.
This, then, is a small refactoring of our underlying function into two parts, where one is responsible for deciding the actual filename to load opening it, and the other is responsible for buffering the file into memory. Our hashing functions can then use only the first function and skip the second.
This then allows us to use
io.Copy
to stream from the file into the hashing function in smaller chunks, possibly of a size chosen by the hash function if it happens to implementio.ReaderFrom
.The new implementation is functionally equivalent to the old but should use less temporary memory if the user passes a large file to one of the hashing functions.
This might help with #28678, but we've not root-caused that yet. Either way, this seems like a reasonable optimization to implement.