Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader does not de-dupe keys when not configured to cache. #280

Open
tony-brookes opened this issue Aug 23, 2021 · 2 comments
Open

DataLoader does not de-dupe keys when not configured to cache. #280

tony-brookes opened this issue Aug 23, 2021 · 2 comments
Labels

Comments

@tony-brookes
Copy link

tony-brookes commented Aug 23, 2021

Expected Behavior

If I pass in multiple identical keys, then the batch function should only be called with the unique set of keys, even if not caching (they are still identical requests.)

Current Behavior

The data loader does not de-duplicate the keys and they are passed along in full to the batch function,.

Possible Solution

The data loader should still de-duplicate the keys. My understanding of "cache" is about whether the results are kept in memory after the batch function is called, not whether or not they are batched up. So I would expect that the data loader would still keep map the unique set of results back to each request which asked for those keys.

Steps to Reproduce

Create a DataLoader with {cache: false} as the options.
Call load(1) twice.
Let the batch function get called.
Note that the value 1 will be present twice in the keys.

Context

We have designed our batch functions to expect each key only once, since that is what the data loader supplies in "normal" cases (normal = cache: true). However, we have recently written a thin wrapper around a data loader, which actually delegates the caching part to Redis so we can get caching across processes / machines. Note that Redis is the backing store for the cache, it is NOT the place the real batch function goes to get data.

As such we just wrap a DataLoader with a simple class that takes the options and the actual batch function, and stores those, but create a data loader inside the object with whatever the supplied data loader options were, but with cache: false. This data loader is given a batch function which is just a closure function which will then go and look things up in Redis and only call the underlying real batch function with the set of keys which are not in the cache.

We can turn this on and off as for our local developers there was no need to force them to spin up a Redis instance, so our code can configurable instantiate a native DataLoader or a CachingDataLoader. We noticed we got some very odd behaviour only when using the caching data loader and after investigation we discovered that this was because our caching data loader batch function was receiving duplicate keys. We were merrily passing these on to our underlying real batch function when there was a cache miss as we assumed they were unique. Our batch functions were all written when we weren't using this and hence they had never been tested with duplicate keys. Granted perhaps we should have tested them but we didn't because at the time it seemed it could never happen.

We have written some code in our caching data loader batch function which de-dupes the list, but semantically it felt wrong that the data loader would not de-dupe the list when it was not caching the results in memory. It felt like the logic of taking in multiple requests across a set of (sometimes repeating) keys and then farming out the responses from the batch function to those requests, should be orthogonal to whether or not those responses were held in memory afterwards or not. Hence the "bug." Perhaps this is a feature request, not entirely sure but it felt disconnected enough from expected behaviour that I've opted to file it as a bug for now. :)

@jacobparis
Copy link

jacobparis commented Oct 29, 2021

This is working as expected. The solution for you is to enable the cache.

The function of DataLoader's cache is to return the same response for all identical keys in a request. One could imagine a non-idempotent data source for which multiple requests to the same key result in distinct objects with differing identities or metadata.

The cache is not meant to persist beyond the lifetime of the request

https://github.com/graphql/dataloader#caching-per-request

DataLoader caching does not replace Redis, Memcache, or any other shared application-level cache. DataLoader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data in the context of a single request to your Application. To do this, it maintains a simple in-memory memoization cache (more accurately: .load() is a memoized function).

Each DataLoader instance contains a unique memoized cache. Use caution when used in long-lived applications or those which serve many users with different access permissions and consider creating a new instance per web request.

@hinogi
Copy link

hinogi commented Sep 19, 2022

You can maybe swap the cacheMap from new Map() to new Set(), set cannot have duplicate keys ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants