Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection hangs forever using lettuce and timeouts are not working #2863

Closed
KP-EV opened this issue Mar 5, 2024 · 5 comments
Closed

Connection hangs forever using lettuce and timeouts are not working #2863

KP-EV opened this issue Mar 5, 2024 · 5 comments
Labels
for: external-project For an external project and not something we can fix

Comments

@KP-EV
Copy link

KP-EV commented Mar 5, 2024

Hi,

we use spring-boot-starter-data-redis:3.2.2 with a single redis instance and configured these timeouts:

spring:
  data:
    redis:
      host: "${REDIS_HOST}" #ip address
      port: "6379"
      timeout: 150ms
      connect-timeout: 200ms
      lettuce:
        pool:
          enabled: true
          max-active: 3
          max-wait: 150ms

But we still notice that sometimes, after the application ran for a day, that the redis connection just completely dies and any call using @Cacheable annotation just hangs forever.

Looking into making the connection more reliable via this thread here: redis/lettuce#1428

But iam also wondering why these three timeouts are not working and the connection still hangs forever. Any other call works fine and after a restart the application is working again.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 5, 2024
@mp911de
Copy link
Member

mp911de commented Mar 5, 2024

Please attach a thread dump for further diagnosis.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Mar 5, 2024
@KP-EV
Copy link
Author

KP-EV commented Mar 5, 2024

dump2.txt

Thanks for the quick response! :)

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 5, 2024
@mp911de
Copy link
Member

mp911de commented Mar 5, 2024

the offending thread has this trace:

"lettuce-nioEventLoop-4-2" #80 [146] daemon prio=5 os_prio=0 cpu=17071.84ms elapsed=63349.49s tid=0x000078f8b8051170 nid=146 waiting on condition  [0x000078f9b41fd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@21.0.2/Native Method)
	- parking to wait for  <0x00000000d09834f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(java.base@21.0.2/LockSupport.java:221)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@21.0.2/AbstractQueuedSynchronizer.java:754)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@21.0.2/AbstractQueuedSynchronizer.java:990)
	at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@21.0.2/ReentrantLock.java:153)
	at java.util.concurrent.locks.ReentrantLock.lock(java.base@21.0.2/ReentrantLock.java:322)
	at java.lang.ref.ReferenceQueue.poll(java.base@21.0.2/ReferenceQueue.java:182)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:295)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:277)
	at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
	at java.util.concurrent.ForkJoinPool.execute(java.base@21.0.2/ForkJoinPool.java:2847)
	at java.lang.VirtualThread.submitRunContinuation(java.base@21.0.2/VirtualThread.java:240)
	at java.lang.VirtualThread.unpark(java.base@21.0.2/VirtualThread.java:734)
	at java.lang.System$2.unparkVirtualThread(java.base@21.0.2/System.java:2660)
	at jdk.internal.misc.VirtualThreads.unpark(java.base@21.0.2/VirtualThreads.java:93)
	at java.util.concurrent.locks.LockSupport.unpark(java.base@21.0.2/LockSupport.java:179)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.signalNext(java.base@21.0.2/AbstractQueuedSynchronizer.java:645)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(java.base@21.0.2/AbstractQueuedSynchronizer.java:1060)
	at java.util.concurrent.locks.ReentrantLock.unlock(java.base@21.0.2/ReentrantLock.java:494)
	at java.lang.ref.ReferenceQueue.poll(java.base@21.0.2/ReferenceQueue.java:186)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:295)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
	at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:277)
	at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
	at java.util.concurrent.ForkJoinPool.execute(java.base@21.0.2/ForkJoinPool.java:2847)
	at java.lang.VirtualThread.submitRunContinuation(java.base@21.0.2/VirtualThread.java:240)
	at java.lang.VirtualThread.unpark(java.base@21.0.2/VirtualThread.java:734)
	at java.lang.System$2.unparkVirtualThread(java.base@21.0.2/System.java:2660)
	at jdk.internal.misc.VirtualThreads.unpark(java.base@21.0.2/VirtualThreads.java:93)
	at java.util.concurrent.locks.LockSupport.unpark(java.base@21.0.2/LockSupport.java:179)
	at java.util.concurrent.CompletableFuture$Signaller.tryFire(java.base@21.0.2/CompletableFuture.java:1848)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@21.0.2/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete(java.base@21.0.2/CompletableFuture.java:2179)
	at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:122)
	at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111)
	at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63)
	at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63)
	at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:745)
	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:680)
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:597)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.runWith(java.base@21.0.2/Thread.java:1596)
	at java.lang.Thread.run(java.base@21.0.2/Thread.java:1583)

Some open telemetry instrumentation blocks the event loop thread so it is out of scope of the driver and out of scope of Spring Data.

Generally, there's some proper tracing support via Micrometer Tracing that doesn't require instrumentation by a Java agent.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Mar 5, 2024
@mp911de mp911de added for: external-project For an external project and not something we can fix and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Mar 5, 2024
@KP-EV
Copy link
Author

KP-EV commented Mar 5, 2024

thanks for your help, ill take a look

@KP-EV
Copy link
Author

KP-EV commented Mar 21, 2024

FYI: disabling the redis implementation of the otel client via OTEL_INSTRUMENTATION_LETTUCE_ENABLED: 'false' worked fine. We didnt had this issue anymore. Thank you for your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

No branches or pull requests

3 participants