Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache should include the promise of matching active requests #1078

Closed
MrSwitch opened this issue Feb 19, 2020 · 18 comments
Closed

Cache should include the promise of matching active requests #1078

MrSwitch opened this issue Feb 19, 2020 · 18 comments

Comments

@MrSwitch
Copy link

What problem are you trying to solve?

Performance improvement multiple simultaneous requests with caching should await for the first to resolve and use the cached results of the first one.

Describe the feature

Contrived example but let's assume got is called simultaneously twice with the same link and a third time after they have resolved.

const A = got(link, {..., cache: map});
const B = got(link, {..., cache: map});

const [a, b] = await Promise.all([A, B]);
console.log('isFromCacheA', a.isFromCache); // false
console.log('isFromCacheB', b.isFromCache); // false

// Call again and we get `true`
const c = await got(link, {..., cache: map});
console.log('isFromCacheC', c.isFromCache); // true

As the outcome appears A and B are both not from the cache. Whilst C uses the cache. What i'd like to see ultimately is one request whose promise is stored in the cache map. And then awaited to serve the cached response.

To get around this one might have to maintain their own map to await on before making subsequent requests...

const map = new Map();
const promiseMap = new Map();
function gotCache(link) {
     if (promiseMap.get(link)) {
          await promiseMap.get(link);
     }
     const prom = got(link, {cache: map});
     promiseMap.set(link, prom);
     return prom;
}
@sithmel
Copy link

sithmel commented Feb 19, 2020

I have some experience in this as I have worked on this library, that implements a similar feature.
I built the dedupe function in async-deco to deal with that https://github.com/sithmel/async-deco

For in memory cache is straightforward. For distributed cache (redis), is definitely more tricky. The problem is that you will need a distributed locking (in async-deco I used redlock), and this could add an overhead to a request, that I haven't had the chance to measure yet.

@szmarczak
Copy link
Collaborator

I'll close this issue since it will be possible to do this as a part of the should accept different caching logic point. I have to say this is a good idea, it will save us some time (10ms+). I am even doing this in my http2-wrapper package.

@sithmel
Copy link

sithmel commented Feb 19, 2020

Just adding some extra thoughts.
In the case of multiple instances with distribute locking, you will completely lose any advantage with the latency of acquiring/releasing the lock. So I think it is a no no.

The reason why I mentioned the use of dedupe (async-deco) is because in reliable-get (the http get library I was mentioning above) we knew in advance whether we cache a resource. In that case you can dedupe the entire request, and this is a very effective optimisation.

In an rfc compliant caching such as GOT, you know if you should cache, only after getting a response. So basically you can't apply that optimisation.

If you use an in-memory cache you have no latency, so the optimisation is not effective.

If you use an external cache (redis for example) you have probably multiple node process, so the optimisation is only effective in the lucky event the request for the same resource lands on the same process.

So in my opinion I think it is not really worth it

@szmarczak
Copy link
Collaborator

Got will follow the RFC but people will be able to change the behavior if they would like to.

latency of acquiring/releasing the lock

This is not C++, Node.js is single-threaded. Node.js is blazing fast, its I/O is slow.

@MrSwitch
Copy link
Author

Thanks @sithmel and @szmarczak

Reading up on the topic and looking at your examples. I have learnt a new term memoize. So it seems pretty simple enough to decorate got function for my needs where i dont have two many different requests. Of course an improved approach would likely use a keyv storage adaptor. But i really didn't look into those too much. Since the memoize decorator actually worked out best for me as i could use it higher up.

Thanks again

@rfgamaral
Copy link

Reading up on the topic and looking at your examples. I have learnt a new term memoize. So it seems pretty simple enough to decorate got function for my needs where i dont have two many different requests. Of course an improved approach would likely use a keyv storage adaptor. But i really didn't look into those too much. Since the memoize decorator actually worked out best for me as i could use it higher up.

If you could share a working example of that memoization, that would be great :)

@MrSwitch
Copy link
Author

If you could share a working example of that memoization, that would be great :)

Here you go @rfgamaral https://github.com/5app/memoize

@rfgamaral
Copy link

Thank you :)

@rfgamaral
Copy link

rfgamaral commented Feb 29, 2020

Here you go @rfgamaral https://github.com/5app/memoize

@MrSwitch Just noticed that you could have used p-memoize or mem for this exact purpose. Any reason you have chosen not to? Just trying to understand my options here :)

@MrSwitch
Copy link
Author

@rfgamaral i didn't know about those beforehand. They all do very much the same thing. I needed to set cache expiry on rejected and resolved Promise's differently in my app so i have defined an option useCache(cachedItem){} to customise that on implementation - sorry It's not really well documented.

Those others you reference look far more established than mine

@rfgamaral
Copy link

rfgamaral commented Mar 1, 2020

@MrSwitch Thank you for that and sorry to bombard you so many questions but "just one more" if you don't mind...

Are you using both memoization and Got's cache property? What exactly is your request/cache flow for both?

Or maybe you're just using memoization over Got's cache? It seems that Got will move in that direction with #875, right? If that's the case, what are the advantages of doing so?

@MrSwitch
Copy link
Author

MrSwitch commented Mar 2, 2020

@rfgamaral that's the reason why i started this issue. Yes, i'm using both, but it's not ideal because memoize'ing effectively duplicates the cached response.

I would have liked to understand the cache adaptors in Got to see if there was a way to use them: However, based on my limited knowledge on the subject, the cache adaptor would be thwarted by:...

  • caches only occur after the response, not the pending request.
  • i'd have to mutate the cached response object to override the expiry headers... which sounds hacky.

The benefit of memoizing is it's a good all round tool that can be applied higher up the call stack.

I hope that helps

@MrSwitch
Copy link
Author

MrSwitch commented Mar 2, 2020

#875 does sounds like it would make it possible.

@rfgamaral
Copy link

Personally, I'm trying to "replicate" RESTDataSource using Got with two features in particular:

  • Request deduplication to prevent double-invoking HTTP calls in rapid succession (memoization).
  • Resource caching of recent HTTP responses to avoid hitting internal APIs just to get the same data.

If I understand everything correctly, I believe I could rely on Got's built-in cache mechanism for resource caching and use mem for request deduplication (with a very short maxAge like 250ms or maybe a bit less).

If I implement both mechanisms correctly I can potentially get something like this:

  • Make a request with a memoized got;
  • If multiple requests were done in quick succession, return a memoized promise (if not expired);
  • If no memoized promise was is available, check Got internal cache for a response;
  • If a cached response is available, return that;
  • If no cached response is available, make a request to the origin endpoint and return the response (while adding the promise to memoized cache and the response to Got's internal cache).

What do you think?

@MrSwitch
Copy link
Author

MrSwitch commented Mar 2, 2020

That's pretty much exactly what i'm doing for the happy path.

I added some extra stuff to handle exceptions https://github.com/5app/memoize/blob/master/README.md#options-memoizehandler-options

@rfgamaral
Copy link

What exactly do you mean with "handle exceptions"? What kind of exceptions?

@MrSwitch
Copy link
Author

MrSwitch commented Mar 2, 2020

What exactly do you mean with "handle exceptions"? What kind of exceptions?

HTTP status codes: 4xx, 5xx responses, we wanted to cache those, but for few seconds rather than minutes.

@rfgamaral
Copy link

Alright, thanks for the clarification :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants