Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refreshing values with a RemovalListener on a BoundedLocalCache with direct executor causes values to be refreshed twice #872

Closed
Hagmar opened this issue Feb 20, 2023 · 6 comments

Comments

@Hagmar
Copy link

Hagmar commented Feb 20, 2023

Thank you for the great library!

I have a setup where I use a cache to store the result of a network request that can possibly be very slow. Certain user interactions are blocking on having the value, so the goal is to avoid cache misses at all costs. To achieve this, I use a removal listener that checks whether the removed entry was evicted or explicitly removed, and if so calls refresh to reload the entry. The actual size of the cache is very small, so there are no memory concerns with preventing entries from being removed.

I'm having an issue after a recent bump from 3.1.2 to 3.1.3, which I believe to have been introduced by this change: 68fbff8. After that change, the removeNode call causes values to be evicted and refreshed, followed by the subsequent remove call which removes and refreshes values a second time.

Here's an example test that works on version 3.1.2 but not on 3.1.3:

import static org.assertj.core.api.Assertions.assertThat;

import com.github.benmanes.caffeine.cache.CacheLoader;
import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import com.github.benmanes.caffeine.cache.RemovalCause;
import com.google.common.util.concurrent.MoreExecutors;
import java.util.concurrent.TimeUnit;
import org.checkerframework.checker.nullness.qual.Nullable;
import org.junit.jupiter.api.Test;

public class CacheTest {
    private final LoadingCache<Integer, Integer> cache = Caffeine.newBuilder()
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .removalListener(this::listenRemoval)
            .executor(MoreExecutors.directExecutor())
            .recordStats()
            .build(new CacheLoader<>() {
                @Override
                public @Nullable Integer load(Integer _key) {
                    return 1;
                }
            });

    private void listenRemoval(Integer key, Integer _value, RemovalCause cause) {
        // We don't want to reload if the value was just replaced
        if (cause.wasEvicted() || cause == RemovalCause.EXPLICIT) {
            cache.refresh(key);
        }
    }

    @Test
    public void testCacheEvictionRefresh() {
        cache.get(1);
        cache.invalidateAll();
        assertThat(cache.stats().loadCount()).isEqualTo(2);
    }
}

Note that this is only an issue when using a direct executor or very fast async refresh, where the value is refreshed and re-added to the cache between removeNode and remove.

@ben-manes
Copy link
Owner

Thank you for the unit test, identifying the problematic commit, and the explanation. You're right that this caused a double removal and conditionalizing the fallback passes your test. It seems then that this should be an easy fix.

I had naively thought that since clearing the cache is not linearizable then performing that fallback logic unconditionally wouldn't do any harm as it would merely pick up new additions. I hadn't considered your case of those being reinserts for a removed key so that it did the deletion twice, which is certainly unexpected behavior. I think something like below might be good, where we track the overflow keys to remove one-by-one afterwards. I might tweak that some more to strengthen the attempt to avoid duplicates, e.g. due to map resizes during traversal.

diff --git a/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java b/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
index 708cac17..a9913a5d 100644
--- a/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
+++ b/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
@@ -46,6 +46,7 @@ import java.lang.ref.WeakReference;
 import java.time.Duration;
 import java.util.AbstractCollection;
 import java.util.AbstractSet;
+import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Comparator;
@@ -53,6 +54,7 @@ import java.util.HashMap;
 import java.util.IdentityHashMap;
 import java.util.Iterator;
 import java.util.LinkedHashMap;
+import java.util.List;
 import java.util.Map;
 import java.util.NoSuchElementException;
 import java.util.Objects;
@@ -2033,6 +2035,7 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
 
   @Override
   public void clear() {
+    List<K> fallbackKeys = List.of();
     evictionLock.lock();
     try {
       // Discard all pending reads
@@ -2053,10 +2056,17 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
       // Discard all entries
       int threshold = (WRITE_BUFFER_MAX / 2);
       long now = expirationTicker().read();
+
       for (var node : data.values()) {
-        if (writeBuffer.size() >= threshold) {
+        var key = node.getKey();
+        if ((key != null) && (writeBuffer.size() >= threshold)) {
           // Fallback to one-by-one to avoid excessive lock hold times
-          break;
+          if (fallbackKeys.isEmpty()) {
+            fallbackKeys = new ArrayList<>(data.size());
+          }
+          fallbackKeys.add(key);
+        } else {
+
         }
         removeNode(node, now);
       }
@@ -2065,7 +2075,7 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
     }
 
     // Remove any stragglers, such as if released early to more aggressively flush incoming writes
-    for (Object key : keySet()) {
+    for (K key : fallbackKeys) {
       remove(key);
     }
   }

@ben-manes
Copy link
Owner

ben-manes commented Feb 21, 2023

When writing tests for this, it turns out that regardless of releases this can get stuck in an infinite loop. That requires using weak keys and a populated cache, which causes the ConcurrentHashMap to return the reinserted entry. Offhand I don't know why weak keys differs, but a fix should try to handle this too.

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;

import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import com.github.benmanes.caffeine.cache.RemovalCause;
import com.github.benmanes.caffeine.cache.testing.CacheContext;

public class CacheTest {
  private final LoadingCache<Integer, Integer> cache = Caffeine.newBuilder()
      .weakKeys()
      .executor(Runnable::run)
      .removalListener(this::listenRemoval)
      .build(key -> 1);

  private void listenRemoval(Integer key, Integer _value, RemovalCause cause) {
    System.out.printf("Removing %s%n", key);
    cache.refresh(key);
  }

  @Test @Timeout(5)
  public void testCacheEvictionRefresh2() {
    for (int i = 0; i < 1000; i++) {
      cache.get(i);
    }
    cache.invalidateAll();
  }
}

@ben-manes
Copy link
Owner

A minimal reproducer. The hash codes have to vary enough (e.g. random) to trigger this case.

public class CacheTest {
  private final LoadingCache<Object, Object> cache = Caffeine.newBuilder()
      .removalListener(this::listenRemoval)
      .executor(Runnable::run)
      .build(key -> key);

  private void listenRemoval(Object key, Object value, RemovalCause cause) {
    System.out.printf("Removing %s%n", key);
    cache.refresh(key);
  }

  @Test
  public void testCacheEvictionRefresh() {
    for (int i = 0; i < 1000; i++) {
      cache.get(ThreadLocalRandom.current().nextInt());
    }
    cache.invalidateAll();
  }
}

@ben-manes
Copy link
Owner

This case was effectively the same as if using a ConcurrentHashMap directly, e.g. below loops forever

var map = new ConcurrentHashMap<Integer, Integer>();
for (int i = 0; i < 1000; i++) {
  map.put(ThreadLocalRandom.current().nextInt(), 1);
}
for (var key : map.keySet()) {
  map.remove(key);
  map.put(key, key);
  System.out.println(key);
}

The solution is loop over a snapshot, e.g. List.copyOf(map.keySet()), so that it performs a single pass. The changes for the cache are a little more involved, but passes the tests now. I'll wrap this up and cut a release, hopefully soon else over the weekend.

@ben-manes
Copy link
Owner

Released in 3.1.5

@Hagmar
Copy link
Author

Hagmar commented Mar 6, 2023

Thank you for the super fast response and fix! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants