Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add a softEntries() build method that will remove both the cached value and its key when memory is almost full #674

Closed
maurice85 opened this issue Feb 27, 2022 · 5 comments

Comments

@maurice85
Copy link

maurice85 commented Feb 27, 2022

I see that Caffeine.newBuilder() has the build methods weakKeys(), weakValues() and softValues(). I'm wondering why it doesn't have softKeys() as a method as well.

I want the garbage collector to start removing cached values when RAM memory becomes critically full. Thats why i do call the method softValues() when creating a new cache manager. However it would be nice if the associated key could also be garbage collected when the value is. It doesn't make sense to let a key reference a null value in this case.

So if possible i would like to have a method that wraps an Map.Entry instance in a SoftReference. So that when the gc starts collecting both the key AND its associated cached value are simultaneously removed from the cache.

I understand that a Map.Entry in a SoftReference can not be added to a regular Map instance. So in this case the CacheManager should not be using the Map it uses by default. But instead it should be using something like apache's SoftHashMap for instance. The description of this class is the following:

/**
 * A <code><em>Soft</em>HashMap</code> is a memory-constrained map that stores its <em>values</em> in
 * {@link SoftReference SoftReference}s.  (Contrast this with the JDK's
 * {@link WeakHashMap WeakHashMap}, which uses weak references for its <em>keys</em>, which is of little value if you
 * want the cache to auto-resize itself based on memory constraints).
 * <p/>
 **Having the values wrapped by soft references allows the cache to automatically reduce its size based on memory
  limitations and garbage collection.**  This ensures that the cache will not cause memory leaks by holding strong
 * references to all of its values.
 * <p/>
 * This class is a generics-enabled Map based on initial ideas from Heinz Kabutz's and Sydney Redelinghuys's
 * <a href="http://www.javaspecialists.eu/archive/Issue015.html">publicly posted version (with their approval)</a>, with
 * continued modifications.
 * <p/>
 *  **This implementation is thread-safe and usable in concurrent environments.**
 *
 * @since 1.0
 */

I've highlighted some parts of the description to indicate how useful this class could be when implemented as a dependency in the Caffeine library.

Could you please consider adding a build method that allows a cachemanager to make use of the SoftHashMap from apache ?

Thank you

@maurice85 maurice85 changed the title Feature request: Add a softEntries() method that will remove both the cached value and its key when memory is almost full Feature request: Add a softEntries() build method that will remove both the cached value and its key when memory is almost full Feb 27, 2022
@ben-manes
Copy link
Owner

When references caching is used then the cache will compare using identity (==) rather than equivalence (equals(o)). For an entry lookup this means that the caller to a weakKeys() cache must supply the exact key instance. There is a request to generalize that in #344 to allow for customizing equality. Guava has that generalized internally but not publically, as the only use-case we found for an interning instances (Interners.newWeakInterner()) and it was misused otherwise (see the rational).

Since a strong reference to the key is needed for lookup, this causes softKeys() to behave identically to weakKeys() from the user's perspective. If the key has no strong references then it could only be retrieved by a Map iterator, which would not be helpful. This realization is why Guava's CacheBuilder initially offered softKeys() and removed that functionality as it only delayed freeing up memory.

The cache key is usually very light and inexpensive whereas the value may be heavier. In that case it likely wouldn't make sense to make the key evictable when the value is strongly reachable, as the JVM won't free up much memory. The case of weakKeys() differs as it can simplify session cleanup, e.g. a request-scoped cache entry for an http request, where once there is no more usages then the cache can prune the entry due to garbage collection.

It doesn't make sense to let a key reference a null value in this case.

This appears to be the root issue and is simply a misunderstanding. When the value is garbage collected then the entire entry is eligible for removal. It may reside temporarily in the cache, just like an entry's whose time-to-live has expired, but it will be detected and discarded during a maintenance cycle. The cache operations will treat it externally as if the entry was absent.

In practice soft references are more problematic than expected, so we prefer an explicit maximum size/weight instead for memory management. The two main reasons are,

  • They tend to work poorly with GC algorithms by promoting entries into the old generation and requiring an expensive full gc to remove. As the soft entries accumulate the GC cycle needs to happen more frequently, which causes longer stop-the-world pause times. That can cause death spirals as the GC tries to minimize individual pause times by freeing up only enough memory as needed, but as the heap is quickly exhausted it leaves little application time. Modern garbage collectors avoid this by performing per-region collections (2048 regions in G1), which means that soft references have a much shorter lifetime.
  • For the application the hit rate is unpredictable as the JVM won't intelligently know how expensive an entry is and evicts globally, so a cheap entry might be retained while expensive ones removed. For example Apache PdfBox softly caches the image fragments when rendering a PDF, which is inexpensive work (cost in nanos) but produces a lot of one-off large entries. As soft references are reclaimed in global LRU order, this could cause the removal of an application's expensive entries that come from a remote resource like the database (cost in millis to seconds).

It is easier to defer decisions to the JVM but those will be much poorer than your own. By explicitly setting a maximum and exposing statistics then you'll have more predictable performance and fewer surprises. It is a bit more work but the right choice for production workloads.

@maurice85
Copy link
Author

Thank you for your swift reply. You say that As the soft entries accumulate the GC cycle needs to happen more frequently, which causes longer stop-the-world pause times. Why would soft entries accumulate? Once a SoftReference value expires it will be removed from the cache and then it will be eligible for GC collection right? So what do you mean with 'accumulate' exactly?

Thanks

@ben-manes
Copy link
Owner

A strong or weak reference is collected by the garbage collector when it is no longer strongly reachable. A cache holding a soft/weak entry won't evict until the GC clears the reference, adding it to a ReferenceQueue to notify the cache so that it can remove the map entry.

A soft reference that is not strongly reachable has its collection delayed so that the application can resurrect it as strongly reachable. This encourages using all of the free memory as available to caches to maximize the hit ratio. That causes the heap to be filled, so that new allocations trigger a garbage collection to shrink the cache only as much as needed. This cached data goes against generational hypothesis, it is quickly promoted out of the young generation. When activity requires freeing up memory then the GC will be forced to perform a major collection, which traditionally was a stop-the-world event (CMS, ParallelGC). That changes behavior from being a sawtooth pattern to mimic a memory leak pattern, as it maximizes the heap utilization by freeing only the minimal amount of memory needed.

You can combine soft references with another policy, such as size or expiration. That will typically cause the cache to evict the entry early enough so that the heap isn't filled, while also giving a safety valve to cope with an unexpected spike in memory usage. That might sound attractive but it doesn't typically help, has its own overhead, and production traffic tends to prefer failing fast approach to meet performance goals and minimize the blast radius of one customer affecting another.

@maurice85
Copy link
Author

Thanks for the explanation Ben-manes.

@ben-manes
Copy link
Owner

Closing. I hope that made sense for why softKeys() doesn't make sense and wouldn't be helpful. While the use of softValues() is not encouraged, it fits your original intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants