DataLoader does not de-dupe keys when not configured to cache. #280

tony-brookes · 2021-08-23T13:49:36Z

Expected Behavior

If I pass in multiple identical keys, then the batch function should only be called with the unique set of keys, even if not caching (they are still identical requests.)

Current Behavior

The data loader does not de-duplicate the keys and they are passed along in full to the batch function,.

Possible Solution

The data loader should still de-duplicate the keys. My understanding of "cache" is about whether the results are kept in memory after the batch function is called, not whether or not they are batched up. So I would expect that the data loader would still keep map the unique set of results back to each request which asked for those keys.

Steps to Reproduce

Create a DataLoader with {cache: false} as the options.
Call load(1) twice.
Let the batch function get called.
Note that the value 1 will be present twice in the keys.

Context

We have designed our batch functions to expect each key only once, since that is what the data loader supplies in "normal" cases (normal = cache: true). However, we have recently written a thin wrapper around a data loader, which actually delegates the caching part to Redis so we can get caching across processes / machines. Note that Redis is the backing store for the cache, it is NOT the place the real batch function goes to get data.

As such we just wrap a DataLoader with a simple class that takes the options and the actual batch function, and stores those, but create a data loader inside the object with whatever the supplied data loader options were, but with cache: false. This data loader is given a batch function which is just a closure function which will then go and look things up in Redis and only call the underlying real batch function with the set of keys which are not in the cache.

We can turn this on and off as for our local developers there was no need to force them to spin up a Redis instance, so our code can configurable instantiate a native DataLoader or a CachingDataLoader. We noticed we got some very odd behaviour only when using the caching data loader and after investigation we discovered that this was because our caching data loader batch function was receiving duplicate keys. We were merrily passing these on to our underlying real batch function when there was a cache miss as we assumed they were unique. Our batch functions were all written when we weren't using this and hence they had never been tested with duplicate keys. Granted perhaps we should have tested them but we didn't because at the time it seemed it could never happen.

We have written some code in our caching data loader batch function which de-dupes the list, but semantically it felt wrong that the data loader would not de-dupe the list when it was not caching the results in memory. It felt like the logic of taking in multiple requests across a set of (sometimes repeating) keys and then farming out the responses from the batch function to those requests, should be orthogonal to whether or not those responses were held in memory afterwards or not. Hence the "bug." Perhaps this is a feature request, not entirely sure but it felt disconnected enough from expected behaviour that I've opted to file it as a bug for now. :)

jacobparis · 2021-10-29T08:19:25Z

This is working as expected. The solution for you is to enable the cache.

The function of DataLoader's cache is to return the same response for all identical keys in a request. One could imagine a non-idempotent data source for which multiple requests to the same key result in distinct objects with differing identities or metadata.

The cache is not meant to persist beyond the lifetime of the request

https://github.com/graphql/dataloader#caching-per-request

DataLoader caching does not replace Redis, Memcache, or any other shared application-level cache. DataLoader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data in the context of a single request to your Application. To do this, it maintains a simple in-memory memoization cache (more accurately: .load() is a memoized function).

Each DataLoader instance contains a unique memoized cache. Use caution when used in long-lived applications or those which serve many users with different access permissions and consider creating a new instance per web request.

hinogi · 2022-09-19T22:16:23Z

You can maybe swap the cacheMap from new Map() to new Set(), set cannot have duplicate keys ;)

tony-brookes added the bug label Aug 23, 2021

denis-sokolov mentioned this issue May 11, 2023

[QUESTION] Why aren't keys being de-duplicated even when the cache is disabled? #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader does not de-dupe keys when not configured to cache. #280

DataLoader does not de-dupe keys when not configured to cache. #280

tony-brookes commented Aug 23, 2021 •

edited

jacobparis commented Oct 29, 2021 •

edited

hinogi commented Sep 19, 2022

DataLoader does not de-dupe keys when not configured to cache. #280

DataLoader does not de-dupe keys when not configured to cache. #280

Comments

tony-brookes commented Aug 23, 2021 • edited

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

jacobparis commented Oct 29, 2021 • edited

hinogi commented Sep 19, 2022

tony-brookes commented Aug 23, 2021 •

edited

jacobparis commented Oct 29, 2021 •

edited