Allow for optionally serving pre-compressed files. #2716

hcldan · 2024-01-30T21:22:46Z

There wasn't a good way to propagate the new option into NamedFile so I made a new constructor and added a flag to the struct. In order to make the flag understandable, I converted to a struct with members instead of a simple tuple.

I am not sure how to test the Content-Type returned... It seems that the original tests didn't check those headers and there's not really a clean way to pass that in the assert functions without refactoring a bunch of stuff.

hcldan · 2024-01-31T18:37:06Z

One approach I tried first was to implement a wrapper around FileServer and implement a Handler that would defer to FileServer and adjust headers on the way out.

I could not figure out how to properly clone the Request in order to get a mutable value I could set_uri() on so I could get it to try to read the alternate file and check to see if it was found. And while something like that would have worked, if I found a way to modify the request, it's still rather kluge-y as the request wasn't for the gz file, it was for the non gz file and should be using the gz file to serve the request for the non-gz file with appropriate headers.

It makes it complicated to tamper with the request like that and have a proper response. I think it's better to make the resource-check by looking at the file system and choosing what file to send back according to the actual request.

hcldan · 2024-01-31T18:48:13Z

Fixes #2718

core/lib/tests/file_server.rs

core/lib/src/fs/server.rs

ApprenticeofEnder · 2024-02-02T18:47:39Z

Final question: Could we also perhaps have some way of providing file hashes with the server? That might be a separate PR, though it's a good thing to think about, especially with compressed files like this.

hcldan · 2024-02-02T19:41:53Z

Hashing the files would be nice for other reasons, like you say, but it would also introduce performance overhead. I think that's best left to another PR to implement features specific to hashing needs.

I also did not implement anything related to cache control headers, as there's a fairly long conversation in other issues and I've not seen much movement on it.

hcldan · 2024-02-06T15:57:57Z

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

hcldan · 2024-02-06T16:00:12Z

@SergioBenitez Is this something that the Rocket team would like to merge? How do I proceed here?

ApprenticeofEnder · 2024-02-06T17:16:58Z

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

Sorry, unfortunately I'm not on the Rocket team! Just thought I'd start some conversation surrounding your PR.

SergioBenitez · 2024-02-06T22:42:59Z

How do I proceed here?

Just as you are!

@SergioBenitez Is this something that the Rocket team would like to merge?

Potentially. It's an interesting idea. As I see it, here are the pros and cons:

The feature asumes that foo.gz is indeed a gzipped version of foo. This may not be the case. It seems trivially simple for foo.gz to be an outdated version of foo, for example.
This adds somewhat complicated logic to the file server, which means the file server is doing things that are more interesting than just serving files.
This is easily accomplished outside of Rocket with a handler that handles file serving.

The pros are:

The logic seems simple enough and, as long as it's opt-in, shouldn't be error-prone.
This is a convenient way to resolve a requested feature.

In general I think I'm in favor of the idea with a different implementation. There are a few key things I would like to see changes.

I don't think we need to make any changes to NamedFile, nor do I support making those changes. If we want a different responder, we define a different one. The impl can be easily derived, for instance, for a structure that includes the file with the appropriate fields or attributes for headers.
As mentioned before, it seems far too simple to serve the incorrect file. I believe we need to add some sort of consistency checking to make this viable. At minimum, we should require that the gz file has a mod time greater than the original file. Ideally however, we would be able to check that the gz file does indeed correspond to the regular file. One approach, as an example, would be to require the gz file name to include a hash of the original file. That is, foo.HASH.gz, where HASH is the hash of foo. We would then compute the hash of the original file and compare the hashes, serving the gz file only if they match. Ideally we'd only need to do this once per file, caching the results of the check for the future.
It feels that there is a more generalizable mechanism at play, one that can not only be used to implement this feature, but other existing features, such as index file serving. What I'm suggesting is effectively mod_rewrite, albeit a very constrained version. If we allow FileServer to be constructed with dynamic "rewrite" functions in which the user (or us internally) is able to tell us which file, if any, should be served for the path identified by the request, then this can be implemented outside of Rocket entirely. This effectively makes FileServer pluggable. We would seek an API by which we can implement most if not all options currently in FileServer. Ideally, the API would also make composition and precedence possible and obvious. For instance, should foo/index.html.gz be served for foo/ if both the Index and Compressed options are enabled? Ideally we'd have an API that would allow the user to create a FileServer that can answer the question in either direction.

If you choose to proceed, I would first like to see a sketch of an API for 3 that would handle most or all of the presently available options. The "sketch" should show that each option for FileServer could be implemented, including the one in question, and how they compose. Ideally the implementation for each option would be rather straightforward and consist only of the bare-minimal logic necessary. For example, the implementation of the Index option should be something as close to as simple as:

path.dir().map(|dir| dir.join("index.html"))

The DotFiles option as simple as:

(allow_dotfiles || !path.starts_with('.')).then(path)

Then if each of these rewrites is nameable in some way, the default FileServer becomes a specific composition of these rewrites:

FileServer::new()
    .rewrite(Jail("/foo")) # rewrite /req/path to /foo/req/path.
    .rewrite(DotFiles(false)) # do not allow DotFiles
    .rewrite(IndexFile) # rewrite /foo or /foo/ to /foo/index.html

And new ones can be added:

FileServer::new()
    .rewrite(Jail("/foo"))
    .rewrite(DotFiles(true))
    .rewrite(IndexFile)
    .rewrite(PreCompressed)

Note how the previous ambiguity regarding ordering is removed and both options (check for compressed before or after index.html rewriting) are easily achievable.

I would be in favor of such a clean-up PR.

hcldan · 2024-02-07T16:16:04Z

Some comments about your proposals:

I think I can come up with something that doesn't change NamedFile, though I do think there will be some duplication of code, because the content-type for a gzipped file should be of the non-gzipped version. Easily doable, but slightly duplicated.

For 2, serving pre-gzipped files is not something new and the other implementations I've seen don't even require the non-gzipped file to exist. (side note: gzip default invocation leaves only a .gz file and no source file). I think I would prefer to avoid over-complicating this.

As you say, the file server should just be serving files. Adding a check for file modification time would be an interesting feature if the server were to handle gzipping non zipped content automatically and then re-generating it if the underlying file has been modified. That is an interesting capability, but one that is out of scope for my needs, and I would argue the needs of the majority. I would rather see additions like that as future pull requests once a viable base feature is delivered.

For 3, the api you list is interesting. In many ways, I wish this was already the case as it would have made this much easier to do. Alas, I do not have time for a rewrite of a crate I'm only just becoming familiar with. I would like to see this feature make a release soon and this sounds like it might require a major version bump because the api is changing and the options are no longer provided the same way.

In any case, I would like to focus on a minimum viable change here so it can be delivered and leveraged.
Later, either I can or someone else with more familiarity with the crate can tackle 3.

Also, 3 seems to dovetail nicely with the other issue opened for lastmodified/etag/arbitrary headers.
I am very reluctant to cross streams here, this could drastically expand the scope of what I had hoped to accomplish.

hcldan · 2024-02-08T16:22:55Z

@SergioBenitez I removed changes to NamedFile which should address (1)

See my response above for (2) and (3)

palant

I’m not the maintainer of this repository, so I’m merely nit-picking which you are free to ignore.

core/lib/src/fs/maybe_zipped_file.rs

palant · 2024-02-20T19:38:46Z

core/lib/src/fs/server.rs

-                let file = NamedFile::open(p).await;
-                file.respond_to(req).or_forward((data, Status::NotFound))
+                let gzip_accepted = req.headers().get("Accept-Encoding")
+                    .find(|value| value.contains("gzip"))


I’m not sure whether a substring match here is a good idea. See https://github.com/tower-rs/tower-http/pull/156/files#diff-83cd763343b90a262fdc4334e5289daea2e9cfb9b1cf76785ea7c42b66d3d5a0R85 for proper parsing code.

It depends on what you mean by "good idea" :)

From reading the spec for the header, this should cover what is needed for this change (gzip only). The header doesn't let you say "please not gzip", so if they mention it, then they support it (even if it is not preferred).

It is true that this is not ideal for future multiple compression types.
Especially if we wish to honor the browser's preference order.

Per RFC 7231, section 5.3.4. Accept-Encoding

If the representation has no content-coding, then it is
acceptable by default unless specifically excluded by the
Accept-Encoding field stating either "identity;q=0" or "*;q=0"
without a more specific entry for "identity".

If the representation's content-coding is one of the
content-codings listed in the Accept-Encoding field, then it is
acceptable unless it is accompanied by a qvalue of 0. (As
defined in Section 5.3.1, a qvalue of 0 means "not acceptable".)

So if it specifies Accept-Encoding: gzip;q=0 it's denying gzip

Right you are, I glossed right over that. I'll see if I can implement something that makes more sense than "contains".

Have a look here: https://github.com/rwf2/Rocket/blob/a866134212a13bfaa61a1cd1e68880461af1ab42/core/http/src/parse/accept.rs. That’s a very similar logic, and this directory should also be the appropriate place to add this parsing code.

This was considerably more work than I had hoped.

@palant I would gladly take any additional unit tests, my time I can spend on this project is running short.

@hcldan Unfortunately, it’s the same situation here. Unless I see some PRs being accepted in a meaningful form, I have no more time left to contribute to this project.

hcldan force-pushed the compression branch 4 times, most recently from 9b948ca to 83a2fc1 Compare January 30, 2024 21:36

hcldan force-pushed the compression branch from 1aae6cd to fe3e577 Compare January 31, 2024 19:06

ApprenticeofEnder reviewed Feb 2, 2024

View reviewed changes

core/lib/tests/file_server.rs Show resolved Hide resolved

ApprenticeofEnder reviewed Feb 2, 2024

View reviewed changes

core/lib/src/fs/server.rs Outdated Show resolved Hide resolved

hcldan force-pushed the compression branch 2 times, most recently from d1c02c1 to ebc5da3 Compare February 8, 2024 16:21

Allow for optionally serving pre-compressed files.

d4d9a46

hcldan force-pushed the compression branch from ebc5da3 to d4d9a46 Compare February 8, 2024 19:47

palant reviewed Feb 20, 2024

View reviewed changes

hcldan added 6 commits February 20, 2024 15:24

remove 'bare' path and simply use with_extension

b33b0c8

Use MaybeCompressedFile name instead to be more generic in the future.

9378c3a

Reduce duplication in branches

9b8ed35

Parse Accept-Encoding properly

840ad74

Send Last-Modified header for served files

84ece56

return NotModified if resource has not been modified.

d98c787

hcldan force-pushed the compression branch from 0cfaa21 to d98c787 Compare March 11, 2024 16:02

the10thWiz mentioned this pull request May 8, 2024

Implement rewrite API #2789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for optionally serving pre-compressed files. #2716

Allow for optionally serving pre-compressed files. #2716

hcldan commented Jan 30, 2024 •

edited

hcldan commented Jan 31, 2024 •

edited

hcldan commented Jan 31, 2024 •

edited

ApprenticeofEnder commented Feb 2, 2024

hcldan commented Feb 2, 2024

hcldan commented Feb 6, 2024

hcldan commented Feb 6, 2024

ApprenticeofEnder commented Feb 6, 2024

SergioBenitez commented Feb 6, 2024 •

edited

hcldan commented Feb 7, 2024 •

edited

hcldan commented Feb 8, 2024

palant left a comment

palant Feb 20, 2024

hcldan Feb 20, 2024 •

edited

osmano807 Feb 20, 2024

hcldan Feb 21, 2024

palant Feb 22, 2024 •

edited

hcldan Feb 22, 2024

hcldan Feb 22, 2024

palant Feb 24, 2024

Allow for optionally serving pre-compressed files. #2716

Are you sure you want to change the base?

Allow for optionally serving pre-compressed files. #2716

Conversation

hcldan commented Jan 30, 2024 • edited

hcldan commented Jan 31, 2024 • edited

hcldan commented Jan 31, 2024 • edited

ApprenticeofEnder commented Feb 2, 2024

hcldan commented Feb 2, 2024

hcldan commented Feb 6, 2024

hcldan commented Feb 6, 2024

ApprenticeofEnder commented Feb 6, 2024

SergioBenitez commented Feb 6, 2024 • edited

hcldan commented Feb 7, 2024 • edited

hcldan commented Feb 8, 2024

palant left a comment

Choose a reason for hiding this comment

palant Feb 20, 2024

Choose a reason for hiding this comment

hcldan Feb 20, 2024 • edited

Choose a reason for hiding this comment

osmano807 Feb 20, 2024

Choose a reason for hiding this comment

hcldan Feb 21, 2024

Choose a reason for hiding this comment

palant Feb 22, 2024 • edited

Choose a reason for hiding this comment

hcldan Feb 22, 2024

Choose a reason for hiding this comment

hcldan Feb 22, 2024

Choose a reason for hiding this comment

palant Feb 24, 2024

Choose a reason for hiding this comment

hcldan commented Jan 30, 2024 •

edited

hcldan commented Jan 31, 2024 •

edited

hcldan commented Jan 31, 2024 •

edited

SergioBenitez commented Feb 6, 2024 •

edited

hcldan commented Feb 7, 2024 •

edited

hcldan Feb 20, 2024 •

edited

palant Feb 22, 2024 •

edited