WrongMethodTypeException after upgrade to 3.1.7 #1111

carterkozak · 2023-08-07T13:48:22Z

I don't have a reproducer for this yet, but I've seen it pop up in a variety of contexts when building a caffeine cache. This has occurred on a few configurations using both build and buildAsync. #905 seems like the most likely culprit.

java.lang.invoke.WrongMethodTypeException: expected ()NodeFactory but found ()NodeFactory
	at java.base/java.lang.invoke.Invokers.newWrongMethodTypeException(Invokers.java:523)
	at java.base/java.lang.invoke.Invokers.checkExactType(Invokers.java:532)
	at com.github.benmanes.caffeine.cache.NodeFactory.newFactory(NodeFactory.java:152)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
	at com.github.benmanes.caffeine.cache.NodeFactory.loadFactory(NodeFactory.java:141)
	at com.github.benmanes.caffeine.cache.NodeFactory.newFactory(NodeFactory.java:89)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.<init>(BoundedLocalCache.java:269)
	at com.github.benmanes.caffeine.cache.SS.<init>(Unknown Source)
	at com.github.benmanes.caffeine.cache.SSL.<init>(Unknown Source)
	at com.github.benmanes.caffeine.cache.SSLMW.<init>(Unknown Source)
	at com.github.benmanes.caffeine.cache.SSLMWA.<init>(Unknown Source)
	at com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:47)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3952)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.<init>(BoundedLocalCache.java:3948)
	at com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1048)

The text was updated successfully, but these errors were encountered:

carterkozak · 2023-08-07T13:50:12Z

I don't have as much information as I would like quite yet, posted early in case others have observed this issue. When I uncover additional details or manage a minimal reproducer, I will update this issue.

ben-manes · 2023-08-07T16:17:34Z

I think any details you have, like jdk version, would help. We primarily test on jdk11 with a few configurations evaluated on later jdks (e.g. benchmarks) that are hotspot/graal based.
@yuzawa-san @uschindler

carterkozak · 2023-08-07T17:22:45Z

So far I've seen this on jdk-17 using Amazon Corretto 17.0.8.7.1 on linux hosts. I'm not confident that it's necessarily specific to jdk-17, because the vast majority of the services we run use jdk-17. The failure is not consistently reproducible in the environments where it has occurred, so I suspect there may be a cache interaction ordering component (or the cache key doesn't take into account some piece of class-loading/lookup state), as I wasn't able to force the failure by executing the same interactions with caffeine in isolation.

ben-manes · 2023-08-07T17:35:42Z

do you use multiple classloaders where threads might cross that boundary? For example, iirc Tomcat hosts each application in its own classloader but ForkJoinPool threads cut across these, so app.ClassA != app.ClassA due to different classloaders and there can be contamination by the shared threads. Those can be tricky bugs and we did not try to protect against that, so I could imagine something quirky there.

carterkozak · 2023-08-07T18:07:31Z

That was my first thought as well, but these applications are roughly dropwizard-shaped microservices which use a single standard classloader. The rate of reproduction is very low, I've only seen ~10 occurrences across thousands of jvm launches.

ben-manes · 2023-08-07T18:14:12Z

It's weird too because, if I understand correctly, Lookup.findClass uses the caller's classloader (NodeFactory.getClassLoader()). Since the static map is held by that class, it should always use the same classloader during the invoke and not become contaminated. There have been bugs in prior JDKs with MethodHandles so it may be worth digging into the JDK issue tracker (e.g. a memory leak meant we reverted our usage pre-jdk11).

uschindler · 2023-08-07T18:15:45Z

To me this looks like a Hotspot bug...

I haven't seen this anywhere. In addition the error message makes no sense. The signature matches exactly: expected ()NodeFactory but found ()NodeFactory

uschindler · 2023-08-07T18:23:28Z

The crazy error message comes from the fact that the method ypes are compared with identity not equals: https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/java.base/share/classes/java/lang/invoke/Invokers.java#L531

No idea why this is done like that. There are many places where invokeExact is used, so this bug is really strange.

As workaround you can go back to invoke, but this will slowdown calls because exact type information is known before.

carterkozak · 2023-08-07T18:24:42Z

It does seem as though MethodType.makeImpl perhaps isn't canonicalizing returned MethodType as we would expect -- I haven't had a chance to review the ConcurrentWeakInternSet in depth.

ben-manes · 2023-08-07T18:26:47Z

I imagine our workaround will be a fallback path once we settle on the cause, e.g. use invokeExact but fallback to invoke if this exception is encountered.

uschindler · 2023-08-07T18:27:01Z

Maybe it is a multithreading problem and some bug in openjdk does not correctly intern/canonicalize the method type instance. We have to figure out if theres some caching involved. If there's really an issue, we may need to open openjdk bug report.

uschindler · 2023-08-07T18:32:50Z

I imagine our workaround will be a fallback path once we settle on the cause, e.g. use invokeExact but fallback to invoke if this exception is encountered.

Haha, add a try...catch around and call invoke on exception 😜

I have never seen this in our code. Invoke exact always works as expected. I really think this is a muktithreading bug in the cache of jdk. 😕

ben-manes · 2023-08-07T21:08:40Z

I wonder if it is expunging a temporary WeakEntry when it fails the race, which causes it to remove a live entry and create duplicates. Following the jdk sources.

When it optimistically creates a WeakEntry<>(elem, stale), that instance is registered on the reference queue for tracking. When a race causes another entry to be cached, then the loser's is discarded. That makes both the entry and its elem eligible for collection. One might expect elem to live longer than the entry since it was a method parameter, but since its unused that might be elided by escape analysis and inlining. So possibly some reordering and lifetimes might cause the GC to think that the WeakReference is still alive and should be enqueued. Since the map.remove(e) is based on object equality, it could discard a different instance than the collected one as we no longer can assume a strict state ordering. A simple fix would be to use map.computeIfPresent(reference, (k, v) -> (reference == v) ? null : v).

I am kind of reaching, but it seems just plausible enough to happen, not be obvious to the author, and pretty hard to orchestrate that race. does this seem believable?

ben-manes · 2023-08-08T00:47:00Z

The more that I look at the code, the more that I convince myself their concurrency is correct (assuming no gc/hotspot visibility race). Searching around all could find was FasterXML/jackson-modules-base#142, which was closed as not reproducible. Since you did not have this problem in the prior release, which used invoke, that seems like a safe fallback.

The major difference with the previous release is that we now cache and reuse the instantiated NodeFactory, which is a stateless factory. That means we should use the method handle only perbuild() for a unique policy configuration, rather than on every cache construction. The actual cost between invoke versus invokeExact is small, so the benefit was doing it only once rather than many times.

Ideally you could dig in far enough to open a hotspot bug, or if the issue goes away we could close as not reproducible like the Jackson team did and hope the cosmic rays stay in our favor. Let me know what you'd like to do and a release timeline.

uschindler · 2023-08-08T06:48:48Z

Hi,
I checked the actual code: using ibvokeExa t makes no sense here:

caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/NodeFactory.java

Line 151 in b1462dd

return (NodeFactory<Object, Object>) constructor

Actually this creates a new MethodType on each call so it does the work of invoke().

I would remove the method type adaption and just call invoke. In short! Lookup constructor and call it with invoke (like before).

When we implemented this first time, the code was different (we cached the constructor handles and called them. But here we do the whole reflective lookup and call it directly a single time after adapting the type. That is too complex and brings nothing.

You should also check the other one that uses invoke exact. If pattern is same, remove adaption, too.

uschindler · 2023-08-08T06:50:54Z

I checked the other ones: https://github.com/search?q=repo%3Aben-manes%2Fcaffeine%20invokeExact&type=code

The one here is the only one that adapts on each call. The other 2 instances use adaption of method type a single time and cache result.

ben-manes · 2023-08-08T06:53:04Z

I think the other one does it to, it just does it earlier than the search snippet shows

caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalCacheFactory.java

Lines 117 to 129 in b1462dd

    
           final class MethodHandleBasedFactory implements LocalCacheFactory { 
        
             final MethodHandle methodHandle; 
        
             MethodHandleBasedFactory(Class<?> clazz) throws NoSuchMethodException, IllegalAccessException { 
        
               var constructor = LOOKUP.findConstructor(clazz, FACTORY); 
        
               this.methodHandle = constructor.asType( 
        
                   constructor.type().changeReturnType(BoundedLocalCache.class)); 
        
             } 
        
             @SuppressWarnings("unchecked") 
        
             @Override public <K, V> BoundedLocalCache<K, V> newInstance(Caffeine<K, V> builder, 
        
                 @Nullable AsyncCacheLoader<? super K, V> cacheLoader, boolean async) throws Throwable { 
        
               return (BoundedLocalCache<K, V>) methodHandle.invokeExact(builder, cacheLoader, async); 
        
             }

ben-manes · 2023-08-08T06:54:22Z

If you'd like to send a PR or provide a snippet then I'd happily merge that. I think I know what you mean but you're much more of an expert here 🙂

uschindler · 2023-08-08T06:58:10Z

The alternative is to define the method type once as static final. Adapting and creating a new method type on each call looks wrong. 😜
But I would not do that here. Just use invoke.

uschindler · 2023-08-08T07:01:09Z

Will do.
For the other ones I will use a static final method type for safety.

ben-manes · 2023-08-08T07:06:07Z

Oh okay, so we should revert back to invoke like before? That seemed simple and effective. The main benefit from our prior discussion ended up being that we cached the factories so the method handle is used only once to create the factory instance. Then the factory is cached in a map, where using a factory uses direct calls. So ideally we only use the method handle once per unique policy combination (size, size + expire, etc). Then even old school reflection would have been equivalent performance-wise as the cost is paid once, rather than each time a new cache instance is built.

uschindler · 2023-08-08T07:27:11Z

Then even old school reflection would have been equivalent performance-wise as the cost is paid once, rather than each time a new cache instance is built.

For the case here, it is not made for performance, reflection would be fine. The main difference is: Because we use a lookup, we can handle package private classes without setAccessible().
In Lucene, to support Java 21 vector API, we also do it in the same way in Lucene: We just use MethodHandles to call package private methods (only visible and without setAccessible): https://github.com/apache/lucene/blob/a65cf8960a1057d98126256d1610292ad5c8f1b3/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L97-L104

The LocalCacheFactory is a bit different: Here we should use invokeExact (as it looks to be called more often), but here we are caching the MethodHandles. When creating them, we should not do it on each lookup, instead define it once as static final.

I have a PR almost ready, running tests.

ben-manes · 2023-08-08T07:30:14Z

I have a PR almost ready, running tests.

haha, we have an exhaustive suite so you might prefer sending a draft pr and letting the CI burn it’s cpu cycles instead of yours.

…n (to work around a possible concurrency bug in JDK)

uschindler · 2023-08-08T07:33:06Z

Here you are: #1114

uschindler · 2023-08-08T07:38:51Z

I have not changed the benchmark, but I think you know wthat I did - maybe adapt the benchmark classes, too. Actually the LocalCacheFactory is the way how one should normally use MethodType instances: Do not create them over an over and keep them as static finals. If you look at the code inside the JDK how they use it for bootstrapping.

I am quite sure that there is some strange problem inside the JDK that sometimes creates non-interned MethodTypes under high load.

ben-manes · 2023-08-08T07:41:43Z

beautiful, I’ll merge this in after the CI completes. I’ll try to cut a release in a day or two when I find the time.

uschindler · 2023-08-08T07:47:34Z

The LambdaMetaFactory benchmark looks fine (...and its not productive code). I would remove that as this is no longer used (but that's up to you).

…k around a possible concurrency bug in JDK) (#1114)

carterkozak · 2023-08-08T15:38:16Z

Thank you both!

uschindler · 2023-08-08T16:53:10Z

Thank you both!

Of course it would be great if you could start up your services with your jdk using a preview build of the library and report if you see any problems.

ben-manes · 2023-08-09T06:16:34Z

Released in 3.1.8

Thanks again for all the help fixing this @uschindler, @carterkozak

uschindler added a commit to uschindler/caffeine that referenced this issue Aug 8, 2023

Fix issue ben-manes#1111 by not creating methodType on each invocatio…

28368b1

…n (to work around a possible concurrency bug in JDK)

uschindler mentioned this issue Aug 8, 2023

Fix issue #1111 by not creating MethodType on each invocation #1114

Merged

ben-manes closed this as completed in #1114 Aug 8, 2023

ben-manes pushed a commit that referenced this issue Aug 8, 2023

Fix issue #1111 by not creating methodType on each invocation (to wor…

82cf9a3

…k around a possible concurrency bug in JDK) (#1114)

ergo14 mentioned this issue Aug 10, 2023

Bump caffeine palantir/tritium#1773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WrongMethodTypeException after upgrade to 3.1.7 #1111

WrongMethodTypeException after upgrade to 3.1.7 #1111

carterkozak commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023 •

edited

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

uschindler commented Aug 7, 2023 •

edited

uschindler commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

uschindler commented Aug 7, 2023 •

edited

uschindler commented Aug 7, 2023

ben-manes commented Aug 7, 2023 •

edited

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

carterkozak commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 9, 2023

WrongMethodTypeException after upgrade to 3.1.7 #1111

WrongMethodTypeException after upgrade to 3.1.7 #1111

Comments

carterkozak commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023 • edited

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

uschindler commented Aug 7, 2023 • edited

uschindler commented Aug 7, 2023

carterkozak commented Aug 7, 2023

ben-manes commented Aug 7, 2023

uschindler commented Aug 7, 2023 • edited

uschindler commented Aug 7, 2023

ben-manes commented Aug 7, 2023 • edited

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 8, 2023

uschindler commented Aug 8, 2023

carterkozak commented Aug 8, 2023

uschindler commented Aug 8, 2023

ben-manes commented Aug 9, 2023

ben-manes commented Aug 7, 2023 •

edited

uschindler commented Aug 7, 2023 •

edited

uschindler commented Aug 7, 2023 •

edited

ben-manes commented Aug 7, 2023 •

edited