putIfAbsent() regression? #506

panghy · 2021-02-22T08:13:42Z

We recently upgraded from 2.7.0 to 2.8.8 and noticed that our putIfAbsent code-paths are contending more heavily than before (at least that's our observation):

   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2030)
	- waiting to lock <0x00007f00ba087828> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1965)

We are aware of the commit 614fe60 that should have improved putIfAbsent but we are somehow seeing the opposite. This is a bounded (1M) synchronous cache with a 1-day expiration.

The text was updated successfully, but these errors were encountered:

ben-manes · 2021-02-22T08:17:47Z

Can you write a simple jmh benchmark to debug against? I want to make sure there is no confusion of how it’s being used. Offhand I’m surprised as well.

panghy · 2021-02-22T08:22:06Z

Will take a look at that

ben-manes · 2021-02-22T18:18:53Z

Taking a fresh look and nothing jumps out at me.

The looping around isAlive() check under the lock should be benign because all calls to transition the entry (retired, dead) occur just before the map compute finishes to remove the entry. That means it may have a stale lookup and spin temporarily, but the map should be updated promptly so that the loop doesn't find an entry.

The lock may be held for a long time if there is some long-running user computation occuring, either via a compute / load or writer. If that is not returning promptly then it blocks further updates to that entry. A mistake can be if an atomic operation on the entry performs a write back into the map, which can cause livelocks in some JDK releases. If not that, then maybe adding timeout safeguards. Searching the other stack traces might hint to the cause.

Of course if there is a regression and you can reproduce it, I'll jump right onto it. Does downgrading to 2.7 help?

panghy · 2021-02-22T18:22:42Z

All line numbers are 2.8.8.

Trying to understand how it can contend on the node at line 2030 (we are also looking at whether 2.7 truly helps with this). Is there a reason if multiple threads hit line 1987 and prior is null that putIfAbsent() on line 2005 returns a non-null value and it cannot also optimize and check for onlyIfAbsent at line 2009 instead of having to synchronize at line 2030 (or am I not reading the code properly?)

ben-manes · 2021-02-22T18:28:47Z

yeah, that does seem to have been overlooked. I think you're right that the optimization could be added there too.

panghy · 2021-02-22T18:39:12Z

Per our own analysis, it seems like we are calling putIfAbsent() more in our release that also took on 2.8.8. Perhaps we are seeing this, i.e. contention on prior != null when multiple threads are trying to call putIfAbsent for the same key at once which leads to the synchronization on prior.

ben-manes · 2021-02-22T20:42:52Z

Let me know how you want to proceed to confirm this. Either downgrade to 2.7 to see if its an issue (as we suspect) or upgrade with a patch. If you are amenable to it, I'd prefer to use a pre-release build to confirm with in case we have to continue digging. I can get a patch in tonight (PST). What do you want to do?

kentwang929 · 2021-02-22T23:38:58Z

Thanks for the fast response. We are currently working on creating the jmh benchmark as well as trying to reproduce the issues consistently on our side. Once we can reproduce it consistently, it would be great if we can move forward with a patch build.

ben-manes · 2021-02-23T01:24:24Z

I pushed the patch to v2.dev. I think it will be hard to test or benchmark, but a nice catch. I didn't see anything cleaner than copying the 10 lines of code.

This is available on jitpack (via commit hash) or Sonatype's snapshot repository (as 2.9.1-SNAPSHOT).

panghy · 2021-02-23T09:39:47Z

Thanks, giving it a spin.

ben-manes · 2021-02-23T10:00:03Z

I forgot to copy into the notifyWriter block as well, if by chance you are using a CacheWriter. If so, you might need to take latest commit.

panghy · 2021-02-24T18:42:51Z

Just to share here as well but it seems like 2.8.8 is faster already (there is no regression): https://github.com/dbaghdasarya/jmh-benchmark

ben-manes · 2021-02-24T18:58:53Z

Oh, good. Thanks for doing that. Does the 2.9.1 snapshot help in your case?

panghy · 2021-02-24T18:59:27Z

We have been testing it at this point, so far it seems to be helping. Thanks!

ben-manes · 2021-02-24T19:04:35Z

great!

By the way, your benchmark has a shared Random used by threaded tests. This uses a lock internally, so that will cause synchronous access. You can use ThreadLocalRandom or create unique data sets by thread scoping (usage example). It may be that the perf difference is noise since JIT'ing, background tasks, etc. can cause skew. I look for large differences and do not put much weight on small changes.

ben-manes · 2021-02-26T05:36:59Z

Please let me know when you have a conclusion of whether the patch is enough or if you are still experiencing performance problems. I’ll cut a release for you once it’s clear that we’ve resolved the issues.

ben-manes · 2021-03-07T08:39:46Z

I am going to hold off on a release due to this optimization being for a very narrow race, that your benchmark didn't find a regression, and that you observed it was likely due to usage changes. Since it doesn't sound critical anymore, I'll wait until enough other changes pile up, unless otherwise pressed.

panghy · 2021-03-08T10:13:55Z

sorry, we're still going through some internal testing but we are releasing with the snapshot build for now, will circle back once we have confirmation that it helps

haosong · 2021-03-10T22:23:19Z

@panghy @ben-manes
We're currently using the version of a89e7a6 and seeing similar issues again: around 200 threads all having parking to wait for <0x00007fba19558e90>:

java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@14/Native Method)
 - parking to wait for  <0x00007fba19558e90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(java.base@14/LockSupport.java:211)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@14/AbstractQueuedSynchronizer.java:714)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@14/AbstractQueuedSynchronizer.java:937)
 at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@14/ReentrantLock.java:153)
 at java.util.concurrent.locks.ReentrantLock.lock(java.base@14/ReentrantLock.java:322)
 at com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(BoundedLocalCache.java:1451)
 at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:1372)
 at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2451)
 at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2367)
 at com.github.benmanes.caffeine.cache.LocalAsyncCache.get(LocalAsyncCache.java:91)
 at com.github.benmanes.caffeine.cache.LocalAsyncCache.get(LocalAsyncCache.java:82)
 at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache.get(LocalAsyncLoadingCache.java:79)```

ben-manes · 2021-03-10T23:22:48Z

This should only occur when the write rate greatly exceeds how quickly the cache can evict. There is a write buffer to capture work to replay against the eviction policy, so that usually a caller is only penalized by the cost to perform the hash table update. When the writer buffer is full (can grow up to ~128 * cpus), then the cache blocks on the eviction lock to assist in draining. This deprioritizes calling threads and lets the one holding the lock get more cpu time, which drains a batch of work. Typically that results in higher throughput and we are protecting against runaway growth.

Are you doing something very expensive that keeps the cache eviction from running or have otherwise starved the thread? By default the FJP#commonPool() is used as the executor. You might try a same-thread (Runnable::run) if that pool is overloaded. You can play with Stresser if you want to see how this all behaves.

panghy · 2021-03-15T07:21:53Z

Hm, when I looked at the stack trace that was captured, it seemed like all threads were blocked waiting for that lock and no progress was being made at all (all threads were parked). That seemed odd and did not point to contention issues but a lock release issue. That or we are doing something else wrong that can cause the lock to be held and not released some-how.

ben-manes · 2021-03-15T07:27:57Z

That is odd. Are you on an old Linux kernel? There is a nasty missing wake up bug that Linus tried to hide, which effectively broke the world.

#203 (comment)

panghy · 2021-03-15T08:08:32Z

Doesn't seem like it, we are on 5.4.0 with Java 14. Just a bit concerning that it started happening with the 2.9.1-SNAPSHOT build.

panghy · 2021-03-15T08:16:47Z

Also still trying to look into whether the latest patch is helping us, seeing:

   java.lang.Thread.State: RUNNABLE
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- locked <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)

while other threads are blocked at:

   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5660107488> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)

The putIfAbsent call still seems to be able to trigger contention.

panghy · 2021-03-15T08:44:34Z

Still trying to piece together the possibilities as to why this could happen (expiration could be it but this is a cache that has a 1 day expire after write and a size limit of 1M).

getIfPresent() is called for a key
returns if a value is present for the key
putIfAbsent() is called against the key with the value

We could switch to a LoadingCache to make things a bit simpler for this particular code path but it's odd how we can contend on prior.

ben-manes · 2021-03-15T08:45:13Z

L2070 is afterRead, so I am assuming you are cutting the stack trace heavily. Can you show the rest of the calls? It is appending the read to a ring buffer and should tryLock to schedule the drain.

panghy · 2021-03-15T08:48:38Z

Hm, those are the only frames we have from jstack. We are on the a89e7a6164 of caffeine (2.9.1-SNAPSHOT). If afterRead is called, shouldn't it return and never synchronize on prior?

ben-manes · 2021-03-15T09:11:33Z

This is confusing. I don’t see why it’s causing so much trouble. Are you sure there are no other places that lock is being held? We do so in a few places, such as within a compute.

Can you try with a executor(Runnable::run)? That’s for the prior stacktrace where it seemed like the FJP was stalled while holding the lock.

panghy · 2021-03-15T09:17:39Z

For the first question, it does seem like it's all blocking on computeIfAbsent:

	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: RUNNABLE
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- locked <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)
--
--
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2070)
	- waiting to lock <0x00007f5e12290dc0> (a com.github.benmanes.caffeine.cache.PSWMS)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:1994)

panghy · 2021-03-15T09:19:18Z

For the other issue where the FJP was stalled, we now have a better idea of the issue, we really did swarm the system with requests that ultimately saturated the system.

ben-manes · 2021-03-15T09:21:40Z

For the first question, it does seem like it's all blocking on computeIfAbsent:

Did you mean “does not”? I only see the putIfAbsent calls there.

panghy · 2021-03-15T09:23:06Z

Sorry, I meant to say that it does seem like it's blocking on putIfAbsent calls.

ben-manes · 2021-03-15T09:29:43Z

Strange. As you see in that code, it’s doing very tiny manipulations. The worst case is a retry loop, but that should go to a fast path exit.

The isAlive check should only fail if the read entry was removed from the hash table (retired or dead transitions occur afterwards). So it should not loop indefinitely. If expired, it should resurrect the entry with a new value. Nothing seems obviously wrong.

can you try a computeIfAbsent? This is a head scratcher.

ben-manes · 2021-03-15T09:43:15Z

My best guess is that there is a thread you are not seeing which holds onto the lock. That might be the FJP trying to evict and not progressing. The direct executor would alleviate that somewhat.

Otherwise this most closely resembles the lost wake up bug which is a kernel problem and not user code. I agree you should be beyond that by now.

panghy · 2021-03-15T10:33:37Z

Will try computeIfAbsent(), thanks for the help!

kentwang929 · 2021-04-02T21:44:44Z

Update:
The rollout of the new changes seems to address the thread-lock issue.
2 things that we have done

switched from putIfAbsent to computeIfAbsent
switch to using MoreExecutors.directExecutor() in the cache

It's a bit weird that computeIfAbsent fixes the issues. Is there something else that we can add to further debug this? Metrics...etc.

ben-manes · 2021-04-02T22:55:37Z

That at least is very good news. I don't know either why it would make such a big difference.

computeIfAbsent performs a Map.get and returns eagerly if present and valid, or else falls back to a Map.compute. That locks within the hash table and performs under that lambda. Typically that is preferred because then the value is only loaded when absent to avoid cache stampedes where the value is computed multiple times as threads race.

putIfAbsent tries to avoid the map compute and lambda as an optimization, which improved throughputs when I tried both variants. In your case it should do that Map.get and exit early if valid, or else fallback to a per-entry lock. This does suffer from a retry loop on a stale read, but the next iteration shouldn't need to retry; as alive -> retired (removed from map) -> dead (removed from lru). I am still unsure why your experience hit the entry lock so frequently rather than the fast path read.

I am not sure how to reproduce to debug it or what metrics would help...

ben-manes added a commit that referenced this issue Feb 23, 2021

Add putIfAbsent optimistic fastpath (#506)

a89e7a6

ben-manes added a commit that referenced this issue Feb 23, 2021

Add putIfAbsent optimistic fastpath (#506)

f5a5e32

ben-manes closed this as completed in a8c274d Mar 7, 2021

jan-berge-ommedal mentioned this issue Feb 23, 2022

single slow load deadlocks entire cache #672

Closed

HashZhang mentioned this issue Sep 4, 2022

Some new findings on the "dead lock" in the previous issues #768

Closed

seyoatda mentioned this issue Sep 11, 2022

LocalLoadingCache.getAll deadlock #771

Closed

putIfAbsent() regression? #506

putIfAbsent() regression? #506

Comments

panghy commented Feb 22, 2021

ben-manes commented Feb 22, 2021

panghy commented Feb 22, 2021

ben-manes commented Feb 22, 2021

panghy commented Feb 22, 2021

ben-manes commented Feb 22, 2021

panghy commented Feb 22, 2021 • edited

ben-manes commented Feb 22, 2021

kentwang929 commented Feb 22, 2021

ben-manes commented Feb 23, 2021

panghy commented Feb 23, 2021

ben-manes commented Feb 23, 2021

panghy commented Feb 24, 2021

ben-manes commented Feb 24, 2021

panghy commented Feb 24, 2021

ben-manes commented Feb 24, 2021 • edited

ben-manes commented Feb 26, 2021

ben-manes commented Mar 7, 2021

panghy commented Mar 8, 2021

haosong commented Mar 10, 2021 • edited

ben-manes commented Mar 10, 2021

panghy commented Mar 15, 2021

ben-manes commented Mar 15, 2021

panghy commented Mar 15, 2021

panghy commented Mar 15, 2021

panghy commented Mar 15, 2021

ben-manes commented Mar 15, 2021

panghy commented Mar 15, 2021

ben-manes commented Mar 15, 2021

panghy commented Mar 15, 2021

panghy commented Mar 15, 2021

ben-manes commented Mar 15, 2021

panghy commented Mar 15, 2021

ben-manes commented Mar 15, 2021 • edited

ben-manes commented Mar 15, 2021

panghy commented Mar 15, 2021

kentwang929 commented Apr 2, 2021

ben-manes commented Apr 2, 2021

panghy commented Feb 22, 2021 •

edited

ben-manes commented Feb 24, 2021 •

edited

haosong commented Mar 10, 2021 •

edited

ben-manes commented Mar 15, 2021 •

edited