Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Cache middleware: Store e2e headers. #1807

Merged

Conversation

thylong
Copy link
Contributor

@thylong thylong commented Mar 5, 2022

Please provide enough information so that others can review your pull request:

As defined in RFC2616 - section-13.5.1, shared caches should store end-to-end headers from backend response and MUST be transmitted in any response formed from a cache entry.

This commit ensures a stronger consistency between responses served from the handlers & from the cache middleware. It also tries to not generate many more memory allocations and ops complexity.

Explain the details for making this change. What existing problem does the pull request solve?

The current cache middleware is minimalist and really fast but doesn't respect several aspects of HTTP/1.1 caching RFCs (RFC2616 & RFC7234).

Non-exhaustive list of unsupported features:

  • The Vary header
  • The Warning header
  • Cache entry containing additional headers

This causes response consistency issues such as the one described in the following scenario :

You have a simple endpoint with a cache middleware instantiated :

package main

import (
        "github.com/gofiber/fiber/v2"
        "github.com/gofiber/fiber/v2/middleware/cache"
)

func main() {
        app := fiber.New()
	app.Use(cache.New())

	app.Get("/", func(c *fiber.Ctx) error {
		c.Response().Header.Add("X-Foobar", "foobar")
		return c.SendString("hi")
	})

        # [...]
}

If you curl -D - http://localhost:8080/ this endpoint, your response will look like the following:

HTTP/1.1 200 OK
Date: Sat, 05 Mar 2022 15:12:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 6548
X-Cache: miss
X-Foobar: foobar

Now if you execute the curl command again, the second response will look like the following:

HTTP/1.1 200 OK
Date: Sat, 05 Mar 2022 15:12:27 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 6548
X-Cache: hit

⚠️ ☝️ The X-Foobar response header gets stripped as it's not store in the cache entry.

Note

@codemicro @ReneWerner87 @Fenny , do you think this should be optional as it changes the default behaviour of the cache middleware?
I'm hesitating to let this PR setting the new default cache middleware behaviour to make it RFC compliant but it could also catch off guard people already using the middleware. WDYT?

@codemicro
Copy link
Contributor

I seem to recall that when we were first merging this into Fiber core, we thought about implementing such a feature and decided against it for whatever reason (I don't remember what exactly).

While I'm not on the maintainers team anymore, I think this would be a good addition to the cache middleware.

do you think this should be optional as it changes the default behaviour of the cache middleware?

I think it's a good idea to put this behind a flag of some sorts, yes, since this is changing observable behaviour of the cache middleware. I'm struggling to find a situation where it'd be a problem if more headers were preserved from the original request, but I'm think there probably are some.

Were this not the case, and it wasn't placed behind a flag, I'd suggest removing explicit caching of the Content-Type and Content-Encoding headers in favour of having it handled by the logic in this PR.

@thylong
Copy link
Contributor Author

thylong commented Mar 5, 2022

Thanks for your prompt answer on this @codemicro 🙏

I think it's a good idea to put this behind a flag of some sorts, yes, since this is changing observable behaviour of the cache middleware. I'm struggling to find a situation where it'd be a problem if more headers were preserved from the original request, but I'm think there probably are some.

IMHO, the risks reside in client identity specific headers (such as Set-Cookie) and backend application maintainers' potential mistakes. This could lead to security issues where the cache serves the same private cookie content to multiple clients.

Were this not the case, and it wasn't placed behind a flag, I'd suggest removing explicit caching of the Content-Type and Content-Encoding headers in favour of having it handled by the logic in this PR.

I was thinking about it as well and preferred to assess if this PR generates interest from the maintainers and you before going further.

I'll create a E2EHeaders flag that defaults to false to make this new feature safe for everyone 👍

@ReneWerner87
Copy link
Member

Thanks for the work,
Maybe you can provide a method for filtering the headers, which by default removes the cookie and security related header
Otherwise I think we could make this the default behavior

Maybe we should do some more research on what were the reasons for not saving and delivering all headers

I will have a look at the pull-requests later or tomorrow

@thylong
Copy link
Contributor Author

thylong commented Mar 5, 2022

Thanks for answer @ReneWerner87 !

Maybe you can provide a method for filtering the headers, which by default removes the cookie and security related header
Otherwise I think we could make this the default behavior

I introduced the E2EHeaders flag to turn off straight the feature but if you prefer to make the default behavio, I can provide a simple mechanism to exclude certain headers in this PR instead 👍

Maybe we should do some more research on what were the reasons for not saving and delivering all headers
I will have a look at the pull-requests later or tomorrow

Sounds good to me, let me know what you decide and I'll re-adjust if necessary 😊

P.S: I would be very interested to participate in cache middleware improvement efforts as I'm currently building several projects on-top of Fiber (closed sources for the moment, but will open them at some point)

@ReneWerner87
Copy link
Member

Gladly any pull request and improvements both functional and in the area of speed or tests are welcome

Of course this also applies to all other parts of the software

@codemicro
Copy link
Contributor

codemicro commented Mar 5, 2022

the risks reside in client identity specific headers (such as Set-Cookie)

I agree, though if an endpoint is setting cookies, I'd argue that most of the time it's a bad idea to be caching any request to that endpoint.

Maybe you can provide a method for filtering the headers, which by default removes the cookie and security related header

Typically, I'd structure my application so that I'm specifically including endpoints to be cached as opposed to specifically excluding endpoints. I wonder if this is an approach should be encouraged, as opposed to implementing a feature to remove headers automatically like this?

Otherwise I think we could make this the default behavior

I think this still has the potential to break things in bizzare edge-cases or when this is used in an unusual manner.


Thanks for your prompt answer on this @codemicro 🙏

You're welcome!

@thylong
Copy link
Contributor Author

thylong commented Mar 5, 2022

In case you believe the flag doesn't offer enough flexibility for the end users, I might have 2 different suggestions that satisfies your two perspectives:

First (simple but whitelisting only)

The E2EHeaders can be a func(c *fiber.Ctx) map[string][]byte instead of just a bool (similar to Next() func).
This would handle all the headers we want to store in the cache.

Use cases

  • By default we would have a function returning an empty map and consequently it won't activate the feature
  • If a map contains headers, they will be stored in the cache
  • If the map contains a "*": []byte{} entry, we could allow all headers

Benefits

  • The default behaviour is safe for everyone as it doesn't change at all
  • We introduce only 1 flag instead of a boolean + a custom config to pass
  • It's possible to take a conservative approach through whitelist
  • ⚠️ It's not possible to take a liberal approach through blacklisting
  • BUT possible to be very liberal through "*"

Second (more complex but blacklisting and/or whitelisting)

The E2EHeaders can be a func(c *fiber.Ctx) [2]map[string][]byte instead of just a bool.
This would handle all the headers we want to store in the cache and:

  • The first slice map would be a whitelist (to store)
  • The second slice map would be a blacklist of headers (to not store)

Use cases

  • By default we would have a function returning an empty map and consequently it won't activate the feature
  • If the whitelist map contains headers, they will be stored in the cache
  • If the blacklist map contains headers, they won't be stored in the cache
  • ⚠️ If the map contains a "*": []byte{} entry, we could allow all headers would be confusing in this case
  • Blacklisting > whitelist in case of conflicts

Benefits

  • The default behaviour is safe for everyone as it doesn't change at all
  • We introduce only 1 flag instead of a boolean + a custom config to pass
  • It's possible to take a conservative approach through whitelist
  • It's possible to take a liberal approach through blacklisting

It feels like we would provide maximum flexibility to the end user without impacting current users.

@codemicro
Copy link
Contributor

codemicro commented Mar 6, 2022

Personally, I'm not sure there needs to be flexibility in relation to what headers are included/excluded. Both your proposals here seem a little overcomplicated for this middleware, whereas the current state of this PR with a simple boolean flag looks alright to me.

@ReneWerner87
Copy link
Member

ReneWerner87 commented Mar 6, 2022

I will have a look at the pull-requests later or tomorrow

Look at it tomorrow, today i have unfortunately no time

@thylong
Copy link
Contributor Author

thylong commented Mar 6, 2022

Personally, I'm not sure there needs to be flexibility in relation to what headers are included/excluded. Both your proposals here seem a little overcomplicated for this middleware, whereas the current state of this PR with a simple boolean flag looks alright to me.

I understand your standpoint and starting with this simple flag instead of increasing the complexity suddenly could be indeed a good option to start with and satisfy the current need 👍

Copy link
Member

@ReneWerner87 ReneWerner87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -133,6 +149,20 @@ func New(config ...Config) fiber.Handler {
e.status = c.Response().StatusCode()
e.ctype = utils.CopyBytes(c.Response().Header.ContentType())
e.cencoding = utils.CopyBytes(c.Response().Header.Peek(fiber.HeaderContentEncoding))
e.e2eHeaders = make(map[string][]byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why convert to string? that would take time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's impossible to determine both header key and value ahead of time, it feels like a map is needed to compare ignoreHeaders with response headers with a O(1) complexity.

If you have another approach to recommend, I take it 🙏

middleware/cache/cache_test.go Show resolved Hide resolved
middleware/cache/config.go Outdated Show resolved Hide resolved
As defined in RFC2616 - section-13.5.1, shared caches MUST
store end-to-end headers from backend response and MUST be
transmitted in any response formed from a cache entry.

This commit ensures a stronger consistency between responses
served from the handlers & from the cache middleware.
Set flag to prevent e2e headers caching to
be the default behavior of the cache middleware.
This would otherwise change quite a lot the
experience for cache middleware current users.
@thylong thylong force-pushed the feat/cache-middleware-e2e-headers branch from 751553a to 583802e Compare March 7, 2022 13:56
E2E is an acronym commonly associated with test.
While in the present case it refers to end-to-end
HTTP headers (by opposition to hop-by-hop), this
still remains confusing. This commits renames it
to a more generic name.
@ReneWerner87
Copy link
Member

@thylong i added 2 more comments for performance improvements stuff

This will prevent an extra memory allocation for users
not interested in this feature.
@thylong
Copy link
Contributor Author

thylong commented Mar 7, 2022

Thanks for your review @ReneWerner87 !

Feedbacks:

  • ✅ I renamed the option from E2EHeaders to StoreResponseHeaders and adjusted the rest of the codebase accordingly
  • ✅ A new benchmark test has been added as requested
  • ✅ You'll find the PR for the documentation here
  • ✅ Fixed the 2 allocations outside of if statements
  • ⚠️ Not quite sure how to solve this the best way

Requesting your review again 🙂

@ReneWerner87
Copy link
Member

I will do the last point.

- use set instead of add for the headers
- copy value from the headers -> prevent problems with mutable values
@ReneWerner87
Copy link
Member

the last point is not so easy

have added something anyway

@thylong
Copy link
Contributor Author

thylong commented Mar 8, 2022

the last point is not so easy

Indeed... Thanks for the the help though ! I'll keep that in mind for later contributions, in case they generate opportunities to make this better.

@ReneWerner87 ReneWerner87 merged commit 1cddc56 into gofiber:master Mar 8, 2022
@thylong thylong deleted the feat/cache-middleware-e2e-headers branch March 8, 2022 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants