Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced ETag Comparison now supported #3015

Merged
merged 23 commits into from Apr 21, 2024
Merged

Advanced ETag Comparison now supported #3015

merged 23 commits into from Apr 21, 2024

Conversation

linus-hologram
Copy link
Contributor

@linus-hologram linus-hologram commented May 14, 2023

These changes are now available in 4.93.0

Vapor now supports strong (byte-by-byte) ETag validation and caches ETags for rapid responses. This provides a stronger alternative to the current weak comparison, which only guarantees semantic file equivalence. This new strong comparison is enabled by default and can be deactivated during FileMiddleware initialization if needed. This PR closes #2948.

  • streamFile method was deprecated and replaced by an alternative returning an EventLoopFuture
  • vapor's unit tests were updated to reflect the changes
  • documentation was updated to reflect the changes

sha-256 digest set when file is written to system
Introduced new streamFile method which allows for advancedETag comparison. Deprecated the old one.
Updates to remove deprecated warnings by using the new streamFile() method. Also removed some other deprecation warnings.
@linus-hologram linus-hologram marked this pull request as ready for review May 14, 2023 19:54
@linus-hologram linus-hologram marked this pull request as draft May 15, 2023 09:43
@linus-hologram linus-hologram marked this pull request as ready for review May 15, 2023 09:45
@linus-hologram
Copy link
Contributor Author

The pipeline indicates an API breakage, however, I don't really see one since the newly added parameter to FileMiddleware's init method has a default value, meaning that calling the initializer the old way would not cause an error. Can anyone give some intel on how the pipeline works here please?

@codecov-commenter
Copy link

codecov-commenter commented May 15, 2023

Codecov Report

Attention: Patch coverage is 89.13043% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 77.45%. Comparing base (11cdb29) to head (52bbc35).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3015      +/-   ##
==========================================
+ Coverage   76.86%   77.45%   +0.58%     
==========================================
  Files         211      211              
  Lines        8119     7802     -317     
==========================================
- Hits         6241     6043     -198     
+ Misses       1878     1759     -119     
Files Coverage Δ
.../Vapor/HTTP/Headers/HTTPHeaders+ContentRange.swift 92.12% <ø> (-0.19%) ⬇️
Sources/Vapor/Middleware/FileMiddleware.swift 93.93% <100.00%> (-0.80%) ⬇️
Sources/Vapor/Utilities/FileIO.swift 78.94% <88.76%> (+3.18%) ⬆️

... and 89 files with indirect coverage changes

Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Here are my initial review comments

Sources/Vapor/Middleware/FileMiddleware.swift Outdated Show resolved Hide resolved
Sources/Vapor/Utilities/FileIO.swift Show resolved Hide resolved
Sources/Vapor/Utilities/FileIO.swift Outdated Show resolved Hide resolved
Tests/VaporTests/RequestTests.swift Outdated Show resolved Hide resolved
Tests/VaporTests/FileTests.swift Show resolved Hide resolved
@linus-hologram
Copy link
Contributor Author

@0xTim one of the pipelines fails because some of the sendable stuff that was recently added

Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this looks good once #3000 lands. I'll do a final review before merging

Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this! I have a few questions now I've looked at this with a fresh set of eyes, but I think we're almost there

Sources/Vapor/Utilities/FileIO.swift Outdated Show resolved Hide resolved
Tests/VaporTests/FileTests.swift Show resolved Hide resolved
Sources/Vapor/Utilities/FileIO.swift Outdated Show resolved Hide resolved
@linus-hologram
Copy link
Contributor Author

I will look at this as soon as I can. Pretty covered up in work now, but I'll get there!

- adjusted faulty comment
- access storage directly to avoid concurrent overwrites of the entire storage
@linus-hologram
Copy link
Contributor Author

@0xTim applied your feedback - please have a look.

I've noticed that the now deprecated streamFile function received the @preprocessor annotation. I haven't been following vapor's recent progress, should I add that annotation to the new streamFile function as well?

Cheer for the review 🎉

Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One query then good to go I think

Tests/VaporTests/FileTests.swift Show resolved Hide resolved
@0xTim 0xTim added the semver-minor Contains new API label Apr 19, 2024
Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - @gwynne any last thoughts?

@gwynne gwynne enabled auto-merge (squash) April 21, 2024 11:14
@gwynne gwynne merged commit 526a000 into vapor:main Apr 21, 2024
15 of 16 checks passed
@mkll
Copy link
Sponsor

mkll commented Apr 22, 2024

Guys, did I not understand something? We use streamFile() when we don’t want, for one reason or another (the main one being file size), to read the entire file into memory.

The new version of the streamFile() method, which appeared in release 4.93.0,, does exactly this (when the advancedETagComparison flag is turned on) — it reads the entire file into memory. Moreover, this is the default behavior of this method. I say "default" because the advancedETagComparison parameter has a default value of true.

So our streamFile() method magically becomes a collectFile() method, at least in terms of memory consumption.

Of course, I understand that in order to take a hash from the contents of a file, it needs to be read into memory, but I am afraid that this innovation “slightly” changes the default behavior of the streamFile() method and makes it unexpected.

I also understand that Vapor must be able to generate ETag in full accordance with the specification. But maybe it’s not worth, firstly, forcing it on everyone, and secondly, it was worth at least warning that if user need an ETag, then he will have a "magical" transformation of a streamFile() into a collectFile()?

It seems to me that it would be worth doing two things:

  1. Do not assign a default value to the advancedETagComparison flag.
  2. (maybe) Do not deprecate the previous variant of the streamFile() method.

@mkll
Copy link
Sponsor

mkll commented Apr 22, 2024

In addition, one more point needs to be mentioned. The new streamFile() method deprives me of the opportunity to create an HTTPHeaders myself. Here is an example of a working code snippet in which I generate the headers I need:

.flatMap { (zipPath, zipFileName) in
	var headers: HTTPHeaders = .init()
	headers.add(name: .contentType, value: "application/zip")
	headers.add(name: .contentDisposition, value: "attachment; filename=\"\(zipFileName)\"")
	
	guard FileManager.default.fileExists(atPath: zipPath) else {
		return req.eventLoop.makeFailedFuture(Abort(.internalServerError, reason: "ZIP archive not found (was not created?)"))
	}
	
	return req.fileio.streamFile(at: zipPath, onCompleted: { result in
		try? FileManager.default.removeItem(atPath: zipPath)
	}).encodeResponse(status: .ok, headers: headers, for: req)
}

It is quite obvious that with the new streamFile() method I will not be able to do this. Of course, I can add my own to the generated headers; there is an “addOrReplace” method (or something similar) for this, but I wouldn’t want to lose the opportunity to completely generate my own headers.

Therefore, I would not say that deprecation of the old method is generally possible. The new method is not an equivalent improved replacement, it has its own area of ​​use, it cannot replace the old one.

@mkll
Copy link
Sponsor

mkll commented Apr 22, 2024

Well, we can probably also mention that the new method imposes on me, as a developer, the use of a specific method of generating ETag, namely SHA256. Meanwhile, the specification does not regulate the method of generation; it only stipulates that the values of the ETag tag must be generated in a way that reflects the uniqueness of the file contents.

In my opinion, using the strong cryptographic function SHA256 for this purpose is redundant. I would use some non-cryptographic function (after all, we don’t need cryptographic strength here), for example, xx_hash.

I am not going to argue that it is necessary to use xx_hash by default, I am only demonstrating that there is more than one option, and the developer has the right to use the one that suits him best.

To summarize the above, I would say this: if I personally ever need to generate an ETag for the content being streamed, I will not use the new method for this because it does not suit me. I will write my own implementation of ETag generation, my own storage location, choose a convenient moment when the ETag will be generated and stored somewhere, create my own HTTP headers.

And for this I, among other things, will need an old method that is prematurely marked as deprecated.

@linus-hologram
Copy link
Contributor Author

Guys, did I not understand something? We use streamFile() when we don’t want, for one reason or another (the main one being file size), to read the entire file into memory.

The new version of the streamFile() method, which appeared in release 4.93.0,, does exactly this (when the advancedETagComparison flag is turned on) — it reads the entire file into memory. Moreover, this is the default behavior of this method. I say "default" because the advancedETagComparison parameter has a default value of true.

So our streamFile() method magically becomes a collectFile() method, at least in terms of memory consumption.

Of course, I understand that in order to take a hash from the contents of a file, it needs to be read into memory, but I am afraid that this innovation “slightly” changes the default behavior of the streamFile() method and makes it unexpected.

I also understand that Vapor must be able to generate ETag in full accordance with the specification. But maybe it’s not worth, firstly, forcing it on everyone, and secondly, it was worth at least warning that if user need an ETag, then he will have a "magical" transformation of a streamFile() into a collectFile()?

It seems to me that it would be worth doing two things:

  1. Do not assign a default value to the advancedETagComparison flag.
  2. (maybe) Do not deprecate the previous variant of the streamFile() method.

You are right in that the file is loaded into memory - however, just once. After the hash is generated, it is cached and future requests for that file will not load the entire file into memory. I agree with you though that this is different to the original streamFile implementation.

Additionally, could you please elaborate on the issue with the response headers? I believe you can still use .encodeResponse(status: .ok, headers: yourHeaders, for: req) even with the new streamFile method, no?

That being said, @0xTim what's your opinion on this?

I'm happy to go over it again :)

@mkll
Copy link
Sponsor

mkll commented Apr 22, 2024

You are right in that the file is loaded into memory - however, just once.

Well, my server with 2GB of memory will not benefit from the fact that a 10GB file will be loaded into memory only once.

To be honest, I am somewhat shocked by the “you can’t do this, but if you only do it once, then you can” approach. Nope. I disagree. If we can't, then we can't.

I would like to draw your attention to the most important and obvious point among everything: we use the streamFile() method precisely so as not to load the entire file into memory. Period.

We use it when we have reasons to do so. We intentionally use the streamFile() method rather than the collectFile() method.

Your edit negates the difference between the two.

If you need a more detailed explanation of my thoughts, here it is:

There may be cases where we can neglect the fact that the entire file is loaded into memory. When can we neglect it? In cases where the file is small. But in these cases, most likely, we will not use the streamFile() method at all.

But when the streamFile() method is vital to us for one reason or another (the most obvious reason, the file is very large) and we use it, we are surprised to discover that the new “streaming” method ceases to be a "streaming", it's now a "collect in memory" method. Without warning, I'll note. And there is simply no point in using it.

And when it comes to really large files that we would not like to give to the client twice without any need, then in this case we would take some special actions to generate the ETag in advance, in the background or by a separate service values and place it in some store.

I assure you that even if my server has 128GB of RAM, I will never want to load a 10GB file into memory like this, on the fly, at the time of processing a request. This is completely unacceptable.

To be honest, I don't understand in what cases the new streamFile() method can be used. In mild cases it is not needed, and in severe cases its implementation simply does not meet the requirements for processing this type of content, since this requires an individual approach to each specific implementation.

The new version of the streamFile() method is suitable, it seems to me, for only one thing: if we are asked whether Vapor supports generating hashes in ETag, we can answer “yes” (but it’s better not to go into details :)).

@mkll
Copy link
Sponsor

mkll commented Apr 22, 2024

Additionally, could you please elaborate on the issue with the response headers? I believe you can still use .encodeResponse(status: .ok, headers: yourHeaders, for: req) even with the new streamFile method, no?

Most likely not, but I'll check it a bit later, re-installing Xcode right now. :)

@linus-hologram
Copy link
Contributor Author

Discussed this with @gwynne and @0xTim and i'll be revising my code

@mkll
Copy link
Sponsor

mkll commented Apr 23, 2024

Additionally, could you please elaborate on the issue with the response headers? I believe you can still use .encodeResponse(status: .ok, headers: yourHeaders, for: req) even with the new streamFile method, no?

@linus-hologram You are right, I checked, we can "stack" multiple .encodeResponse() methods one after the other.

@mkll
Copy link
Sponsor

mkll commented Apr 24, 2024

@linus-hologram I think that release 4.93.2 is optimal, providing a default solution, but at the same time leaving the developer the freedom to choose his own implementation option. Thanks!

keniwhat pushed a commit to keniwhat/vapor that referenced this pull request Apr 27, 2024
* main:
  Patch configuration and log actual port on startup (vapor#3160)
  Update provider tests to 5.10 (vapor#3178)
  Migrate to Async NIOFileIO APIs (vapor#3167)
  Removed streamFile deprecation + deactivated advancedETagComparison by default (vapor#3177)
  Remove HeadResponder (vapor#3147)
  Advanced ETag Comparison now supported (vapor#3015)
  Enabled Request Decompression By Default (vapor#3175)
  HTTP2 Response Compression/Request Decompression (vapor#3126)
  Don't set ignore status for SIGTERM and SIGINT on Linux (vapor#3174)
  Fix typos across the codebase (vapor#3162)
  Fix some Sendable warnings on 5.10 (vapor#3158)
  Allow `HTTPServer`'s configuration to be dynamically updatable (vapor#3132)
  Fix issue when client disconnects midway through a stream (vapor#3102)
  Fix handling of "flag" URL query params (vapor#3151)
  Bump the dependencies group with 1 update (vapor#3148)
  Merge Async Tests (vapor#3141)
  Fix URI handling with multiple slashes and variable components. (vapor#3143)
  Fix broken URI behaviors (vapor#3140)

# Conflicts:
#	Package.swift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor Contains new API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ETag should be a hash
6 participants