Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataSources: InMemory LRU Cache has Infinite maxSize #2252

Closed
nglgzz opened this issue Jan 31, 2019 · 9 comments
Closed

DataSources: InMemory LRU Cache has Infinite maxSize #2252

nglgzz opened this issue Jan 31, 2019 · 9 comments

Comments

@nglgzz
Copy link

nglgzz commented Jan 31, 2019

From DataSources docs.

By default, resource caching will use an in-memory LRU cache.

This is misleading considering that the default cache has a maxSize of Infinite. You can have any eviction policy, but if the cache size is infinite, you'll never evict any keys.

My suggestion is to either:

a) Update the docs saying that by default the cache has no memory limit
b) Set a reasonable maxSize, and update the function that calculates size of items, since currently it doesn't really make sense (length of item if it's a string or array, length of stringified object, or 1 in other cases).

I think the second option is the cleanest solution, and if you are okay with it I'll make a PR for it.

@abernix
Copy link
Member

abernix commented Jan 31, 2019

You're right that it's currently not evicting anything.

In terms of the second half of your b) option, some of that's already been done via changes in the upcoming 2.4.0 release (#2215; specifically; https://github.com/apollographql/apollo-server/pull/2215/files#diff-3c78ac01fa2bb0ac00758ed1436212a6 and even more specifically via be71620). That will land pretty soon (I think Monday).

I'm curious what your thoughts are in terms of a reasonable maxSize for a RESTDataSource. It seems it's very likely that should be configured per data-source, particularly since the shape of their responses can vary so much.

@nglgzz
Copy link
Author

nglgzz commented Jan 31, 2019

I am not sure I understand the default size calculation.

function defaultLengthCalculation(item: any) {
  if (Array.isArray(item) || typeof item === 'string') {
    return item.length;
  }

  // Go with the lru-cache default "naive" size, in lieu anything better:
  //   https://github.com/isaacs/node-lru-cache/blob/a71be6cd/index.js#L17
  return 1;
}

According to this function foo and ['foo', 'bar', 'baz'] occupy the same size in cache. Is this expected behavior? Wouldn't the following be more accurate?

function defaultLengthCalculation(item: any) {
 if (typeof item === 'object') {
   return Buffer.byteLength(JSON.stringify(item));
 }

 if (typeof item === 'string') {
   return Buffer.byteLength(item);
 }

 return 1;
}

For maxSize I was thinking something not too big (some machines don't have much RAM), but still able to hold some decent amount of data. For instance the default memory limit for memcached is 64MB. I would use something around that value.

I don't think a max size should be set for each DataSource. Isn't it just one cache shared between different DataSources?

@abernix
Copy link
Member

abernix commented Jan 31, 2019

According to this function foo and ['foo', 'bar', 'baz'] occupy the same size in cache. Is this expected behavior?

I think this is the expected behavior for a generic LRU which is meant to be utilized in many places — not just RESTDataSource. The basic implementation should not take any liberties — including trying to approximate object size. It should still keep using a naive basic implementation but rely on individual implementors to provide their more exact rules. This is exactly the same way that the lru-cache that it's based on behaves. And lastly, the .length calculation makes a naive assumption that you're storing the same like-typed objects (not, say, a mixture of Arrays and Strings) in the store — which we are.

To be clear: The InMemoryLRUCache did not previously allow this configuration but that's precisely what the changes in be71620 bring. We've utilized the newly available sizeCalculator in this location.

I don't think a max size should be set for each DataSource. Isn't it just one cache shared between different DataSources?

There are multiple caches, one for each data source (see

initialize(config: DataSourceConfig<TContext>): void {
this.context = config.context;
this.httpCache = new HTTPCache(config.cache);
}
), which I think is fair since each underlying endpoint may have varying dynamics which make it more or less important for it to be in the cache. For example, a local REST endpoint doesn't necessarily benefit as much from the LRU cache as a remote REST endpoint with poor performance dynamics.

This all to say, I think we should consider passing maxSize with a sensible default to InMemoryLRUCache via its construction in HTTPCache (And actually, using the default length behavior happens to be fine in this case since we're storing strings with String.prototype.length properties). I suspect this should be opened up as a DataSoureConfig option, but the naming could be confusing with an existing cache on there.

I'm curious what @martijnwalraven thinks.

@ThiruvenkatamR
Copy link

Hi,

I would to like add one more thing related to this. I am using apollo-rest-datasource to fetch data from a REST endpoint. I faced the same issue that requests are cached infinitly, I would like to add that even if the headers changes still we get the same response.

For example, if we get a list of data from an endpoint it is cache, this request had authorization header set to ABCD. Now, when we request the same endpoint with authorization header WXYZ we still get the same response. It doesn't even identify the change, it only checks if the endpoint is same. We use authorization headers for access related details, but we are not able to do it. What ever access a the person first time had is cached for everyone else.

Please let me know how to handle this

Thanks

@nharlow89
Copy link

@ThiruvenkatamR It looks like the caching mechanism for RestDataSource is pretty basic and does not take into account the different authorization headers.

// By default, we use the full request URL as the cache key.
// You can override this to remove query parameters or compute a cache key in any way that makes sense.
// For example, you could use this to take Vary header fields into account.
// Although we do validate header fields and don't serve responses from cache when they don't match,
// new reponses overwrite old ones with different vary header fields.
protected cacheKeyFor(request: Request): string {
return request.url;
}

I am not convinced that the following statement is accurate

we do validate header fields and don't serve responses from cache when they don't match

You may be able to solve your problem by implementing your own cacheKeyFor method and appending your authorization header somewhere in there, for example:

protected cacheKeyFor(request: Request): string { 
  if(request.headers.get('authorization'))
    return `${request.headers.get('authorization')}|${request.url}`; 
  return request.url
 } 

@ThiruvenkatamR
Copy link

Thanks @nharlow89 Will check it out. Much appreciated

@borekb
Copy link
Contributor

borekb commented Feb 25, 2020

Just to make sure, to create a LRU cache for 100 items, I have to do this, right?

new ApolloServer({
  cache: new InMemoryLRUCache({
    maxSize: 100,
    sizeCalculator: () => 1,
  }),
});

It's a bit unintuitive that I have to provide the size calculator but not providing would probably eat all my cache with possibly a single string, right?

@patrickwall57
Copy link

Would a default of 0MB coupled with an explanation in the docs on how to enable the cache be more reasonable than an unbounded cache? cries in out of memory exception

@glasser
Copy link
Member

glasser commented Oct 11, 2022

The class in question has moved from apollo-server-caching to @apollo/utils.keyvaluecache, and as part of that migration, the new version of the class is bounded by default.

@glasser glasser closed this as completed Oct 11, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants