Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

fix(cache): add some minimal cache to prevent 429s #158

Closed
wants to merge 3 commits into from

Conversation

trieloff
Copy link
Contributor

fixes #153

recommended for adobe/helix-content-proxy#144

@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #158 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #158   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           11        11           
  Lines          292       292           
=========================================
  Hits           292       292           
Impacted Files Coverage Δ
src/matchers/excel.js 100.00% <ø> (ø)
src/matchers/google.js 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5531ed4...ceea154. Read the comment docs.

Copy link
Contributor

@tripodsan tripodsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see reasoning in adobe/helix-content-proxy#144 (review)

Reverse logic: if request cache-control header is set to max-age=xx overwrite the response cache-control header.

sidenote: 120s is way too long.... in order to prevent rate-limit problems, 10s is enough.

@trieloff
Copy link
Contributor Author

I can add replaying the request cache control (no-cache) header, as suggested in adobe/helix-content-proxy#144 (review) but replaying max-age is going against the spec and means we would also have to Vary on Cache-Control. Having a default cache and turning it off on demand is the cleaner solution.

Sidenote: the cache is per URL, 120 seconds is Excel's back-off time and I'd consider 60 seconds the minimum for a short-lived cache.

@tripodsan
Copy link
Contributor

I can add replaying the request cache control (no-cache) header, as suggested in adobe/helix-content-proxy#144 (review) but replaying max-age is going against the spec and means we would also have to Vary on Cache-Control. Having a default cache and turning it off on demand is the cleaner solution.

note, that no-cache and no-store are different. no-cache and means: cache, but revalidate.
no-store means: never store. for our use-case, where we can't actually re-validate without executing the action, no-store is better IMO.

but still - I think default should be no-store, private. and we need to find a mechanism to enable caching for certain requests.

Sidenote: the cache is per URL, 120 seconds is Excel's back-off time and I'd consider 60 seconds the minimum for a short-lived cache.

maybe the back-off time, but when we cache every 10 seconds, and have 1000 hits/seconds, we still only get 1 every 10 seconds (well, if the cache serves stale-while-invalidate). so this is enough to throttle. 1 minute is a long time to way for your page to get updated......

@trieloff
Copy link
Contributor Author

We do have an easy and standardized way of disabling caching on request: no-cache (request).

We do not have a way of enabling it on demand. Sure, we could try to build a kludge, but it would be ad-hoc and brittle.

The vast majority of requests we serve are old content served to visitors. If we have a way of satisfying these requests in a fast and scalable way (without hitting rate limits) then this should be the default. The exceptional case that an author wants an immediate preview is something that we can handle through a special case (no-cache request), so there isn't even a tradeoff that needs to be made.

Remember that we have four consumers of helix-data-embed:

  1. helix-content-proxy
  2. helix-pipeline
  3. helix-redirect
  4. ad hoc requests

And even for helix-content-proxy (where we could turn off the caching if we wanted to), the immediate updates needed for the blog index are more an edge case than the regular serving of largely unchanged spreadsheets.

For this use case, I'd recommend that we improve helix-query-index, so that it can use Excel (via helix-data-embed) and use the configurable cache settings there instead of going with the lowest common (and most dangerous) denominator here.

@tripodsan
Copy link
Contributor

more an edge case than the regular serving of largely unchanged spreadsheets.

I think @davidnuescheler would disagree....

anyways.... I still think that no cache should be the default. or at least, set the max-age to 10 seconds.

@github-actions
Copy link

This PR will trigger a patch release when merged.

@trieloff trieloff closed this Oct 2, 2020
@tripodsan tripodsan deleted the some-cache branch March 17, 2021 08:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache (just a little bit) when serving Excel
2 participants