Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for optionally serving pre-compressed files. #2716

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

hcldan
Copy link

@hcldan hcldan commented Jan 30, 2024

There wasn't a good way to propagate the new option into NamedFile so I made a new constructor and added a flag to the struct. In order to make the flag understandable, I converted to a struct with members instead of a simple tuple.

I am not sure how to test the Content-Type returned... It seems that the original tests didn't check those headers and there's not really a clean way to pass that in the assert functions without refactoring a bunch of stuff.

@hcldan hcldan force-pushed the compression branch 4 times, most recently from 9b948ca to 83a2fc1 Compare January 30, 2024 21:36
@hcldan
Copy link
Author

hcldan commented Jan 31, 2024

One approach I tried first was to implement a wrapper around FileServer and implement a Handler that would defer to FileServer and adjust headers on the way out.

I could not figure out how to properly clone the Request in order to get a mutable value I could set_uri() on so I could get it to try to read the alternate file and check to see if it was found. And while something like that would have worked, if I found a way to modify the request, it's still rather kluge-y as the request wasn't for the gz file, it was for the non gz file and should be using the gz file to serve the request for the non-gz file with appropriate headers.

It makes it complicated to tamper with the request like that and have a proper response. I think it's better to make the resource-check by looking at the file system and choosing what file to send back according to the actual request.

@hcldan
Copy link
Author

hcldan commented Jan 31, 2024

Fixes #2718

@ApprenticeofEnder
Copy link

Final question: Could we also perhaps have some way of providing file hashes with the server? That might be a separate PR, though it's a good thing to think about, especially with compressed files like this.

@hcldan
Copy link
Author

hcldan commented Feb 2, 2024

Hashing the files would be nice for other reasons, like you say, but it would also introduce performance overhead. I think that's best left to another PR to implement features specific to hashing needs.

I also did not implement anything related to cache control headers, as there's a fairly long conversation in other issues and I've not seen much movement on it.

@hcldan
Copy link
Author

hcldan commented Feb 6, 2024

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

@hcldan
Copy link
Author

hcldan commented Feb 6, 2024

@SergioBenitez Is this something that the Rocket team would like to merge? How do I proceed here?

@ApprenticeofEnder
Copy link

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

Sorry, unfortunately I'm not on the Rocket team! Just thought I'd start some conversation surrounding your PR.

@SergioBenitez
Copy link
Member

SergioBenitez commented Feb 6, 2024

How do I proceed here?

Just as you are!

@SergioBenitez Is this something that the Rocket team would like to merge?

Potentially. It's an interesting idea. As I see it, here are the pros and cons:

  • The feature asumes that foo.gz is indeed a gzipped version of foo. This may not be the case. It seems trivially simple for foo.gz to be an outdated version of foo, for example.
  • This adds somewhat complicated logic to the file server, which means the file server is doing things that are more interesting than just serving files.
  • This is easily accomplished outside of Rocket with a handler that handles file serving.

The pros are:

  • The logic seems simple enough and, as long as it's opt-in, shouldn't be error-prone.
  • This is a convenient way to resolve a requested feature.

In general I think I'm in favor of the idea with a different implementation. There are a few key things I would like to see changes.

  1. I don't think we need to make any changes to NamedFile, nor do I support making those changes. If we want a different responder, we define a different one. The impl can be easily derived, for instance, for a structure that includes the file with the appropriate fields or attributes for headers.
  2. As mentioned before, it seems far too simple to serve the incorrect file. I believe we need to add some sort of consistency checking to make this viable. At minimum, we should require that the gz file has a mod time greater than the original file. Ideally however, we would be able to check that the gz file does indeed correspond to the regular file. One approach, as an example, would be to require the gz file name to include a hash of the original file. That is, foo.HASH.gz, where HASH is the hash of foo. We would then compute the hash of the original file and compare the hashes, serving the gz file only if they match. Ideally we'd only need to do this once per file, caching the results of the check for the future.
  3. It feels that there is a more generalizable mechanism at play, one that can not only be used to implement this feature, but other existing features, such as index file serving. What I'm suggesting is effectively mod_rewrite, albeit a very constrained version. If we allow FileServer to be constructed with dynamic "rewrite" functions in which the user (or us internally) is able to tell us which file, if any, should be served for the path identified by the request, then this can be implemented outside of Rocket entirely. This effectively makes FileServer pluggable. We would seek an API by which we can implement most if not all options currently in FileServer. Ideally, the API would also make composition and precedence possible and obvious. For instance, should foo/index.html.gz be served for foo/ if both the Index and Compressed options are enabled? Ideally we'd have an API that would allow the user to create a FileServer that can answer the question in either direction.

If you choose to proceed, I would first like to see a sketch of an API for 3 that would handle most or all of the presently available options. The "sketch" should show that each option for FileServer could be implemented, including the one in question, and how they compose. Ideally the implementation for each option would be rather straightforward and consist only of the bare-minimal logic necessary. For example, the implementation of the Index option should be something as close to as simple as:

path.dir().map(|dir| dir.join("index.html"))

The DotFiles option as simple as:

(allow_dotfiles || !path.starts_with('.')).then(path)

Then if each of these rewrites is nameable in some way, the default FileServer becomes a specific composition of these rewrites:

FileServer::new()
    .rewrite(Jail("/foo")) # rewrite /req/path to /foo/req/path.
    .rewrite(DotFiles(false)) # do not allow DotFiles
    .rewrite(IndexFile) # rewrite /foo or /foo/ to /foo/index.html

And new ones can be added:

FileServer::new()
    .rewrite(Jail("/foo"))
    .rewrite(DotFiles(true))
    .rewrite(IndexFile)
    .rewrite(PreCompressed)

Note how the previous ambiguity regarding ordering is removed and both options (check for compressed before or after index.html rewriting) are easily achievable.

I would be in favor of such a clean-up PR.

@hcldan
Copy link
Author

hcldan commented Feb 7, 2024

Some comments about your proposals:

I think I can come up with something that doesn't change NamedFile, though I do think there will be some duplication of code, because the content-type for a gzipped file should be of the non-gzipped version. Easily doable, but slightly duplicated.


For 2, serving pre-gzipped files is not something new and the other implementations I've seen don't even require the non-gzipped file to exist. (side note: gzip default invocation leaves only a .gz file and no source file). I think I would prefer to avoid over-complicating this.

As you say, the file server should just be serving files. Adding a check for file modification time would be an interesting feature if the server were to handle gzipping non zipped content automatically and then re-generating it if the underlying file has been modified. That is an interesting capability, but one that is out of scope for my needs, and I would argue the needs of the majority. I would rather see additions like that as future pull requests once a viable base feature is delivered.


For 3, the api you list is interesting. In many ways, I wish this was already the case as it would have made this much easier to do. Alas, I do not have time for a rewrite of a crate I'm only just becoming familiar with. I would like to see this feature make a release soon and this sounds like it might require a major version bump because the api is changing and the options are no longer provided the same way.

In any case, I would like to focus on a minimum viable change here so it can be delivered and leveraged.
Later, either I can or someone else with more familiarity with the crate can tackle 3.

Also, 3 seems to dovetail nicely with the other issue opened for lastmodified/etag/arbitrary headers.
I am very reluctant to cross streams here, this could drastically expand the scope of what I had hoped to accomplish.

@hcldan hcldan force-pushed the compression branch 2 times, most recently from d1c02c1 to ebc5da3 Compare February 8, 2024 16:21
@hcldan
Copy link
Author

hcldan commented Feb 8, 2024

@SergioBenitez I removed changes to NamedFile which should address (1)

See my response above for (2) and (3)

Copy link

@palant palant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not the maintainer of this repository, so I’m merely nit-picking which you are free to ignore.

core/lib/src/fs/maybe_zipped_file.rs Outdated Show resolved Hide resolved
core/lib/src/fs/maybe_zipped_file.rs Outdated Show resolved Hide resolved
core/lib/src/fs/maybe_zipped_file.rs Outdated Show resolved Hide resolved
let file = NamedFile::open(p).await;
file.respond_to(req).or_forward((data, Status::NotFound))
let gzip_accepted = req.headers().get("Accept-Encoding")
.find(|value| value.contains("gzip"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure whether a substring match here is a good idea. See https://github.com/tower-rs/tower-http/pull/156/files#diff-83cd763343b90a262fdc4334e5289daea2e9cfb9b1cf76785ea7c42b66d3d5a0R85 for proper parsing code.

Copy link
Author

@hcldan hcldan Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what you mean by "good idea" :)

From reading the spec for the header, this should cover what is needed for this change (gzip only). The header doesn't let you say "please not gzip", so if they mention it, then they support it (even if it is not preferred).

It is true that this is not ideal for future multiple compression types.
Especially if we wish to honor the browser's preference order.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per RFC 7231, section 5.3.4. Accept-Encoding

  1. If the representation has no content-coding, then it is
    acceptable by default unless specifically excluded by the
    Accept-Encoding field stating either "identity;q=0" or "*;q=0"
    without a more specific entry for "identity".
  1. If the representation's content-coding is one of the
    content-codings listed in the Accept-Encoding field, then it is
    acceptable unless it is accompanied by a qvalue of 0. (As
    defined in Section 5.3.1, a qvalue of 0 means "not acceptable".)

So if it specifies Accept-Encoding: gzip;q=0 it's denying gzip

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right you are, I glossed right over that. I'll see if I can implement something that makes more sense than "contains".

Copy link

@palant palant Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look here: https://github.com/rwf2/Rocket/blob/a866134212a13bfaa61a1cd1e68880461af1ab42/core/http/src/parse/accept.rs. That’s a very similar logic, and this directory should also be the appropriate place to add this parsing code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was considerably more work than I had hoped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palant I would gladly take any additional unit tests, my time I can spend on this project is running short.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hcldan Unfortunately, it’s the same situation here. Unless I see some PRs being accepted in a meaningful form, I have no more time left to contribute to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants