Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io.grpc.xds.SharedCallCounterMap.cleanQueue() NullPointerException #8397

Closed
sky91 opened this issue Aug 9, 2021 · 4 comments · Fixed by #8466
Closed

io.grpc.xds.SharedCallCounterMap.cleanQueue() NullPointerException #8397

sky91 opened this issue Aug 9, 2021 · 4 comments · Fixed by #8466
Assignees
Labels
Milestone

Comments

@sky91
Copy link

sky91 commented Aug 9, 2021

Version: v1.38.1
Env: Linux, Jdk8

It happened to grpc client with xds enabled:

io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
        ......
Caused by: java.lang.NullPointerException: null
        at io.grpc.xds.SharedCallCounterMap.cleanQueue(SharedCallCounterMap.java:76)
        at io.grpc.xds.SharedCallCounterMap.getOrCreate(SharedCallCounterMap.java:67)
        at io.grpc.xds.ClusterImplLoadBalancer.handleResolvedAddresses(ClusterImplLoadBalancer.java:124)
        at io.grpc.util.ForwardingLoadBalancer.handleResolvedAddresses(ForwardingLoadBalancer.java:46)
        at io.grpc.xds.PriorityLoadBalancer$ChildLbState.updateResolvedAddresses(PriorityLoadBalancer.java:269)
        at io.grpc.xds.PriorityLoadBalancer.tryNextPriority(PriorityLoadBalancer.java:136)
        at io.grpc.xds.PriorityLoadBalancer.handleResolvedAddresses(PriorityLoadBalancer.java:102)
        at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState.handleEndpointResourceUpdate(ClusterResolverLoadBalancer.java:246)
        at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState.access$1600(ClusterResolverLoadBalancer.java:157)
        at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState$EdsClusterState$1EndpointsUpdated.run(ClusterResolverLoadBalancer.java:427)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState$EdsClusterState.onChanged(ClusterResolverLoadBalancer.java:431)
        at io.grpc.xds.ClientXdsClient$ResourceSubscriber.notifyWatcher(ClientXdsClient.java:1562)
        at io.grpc.xds.ClientXdsClient$ResourceSubscriber.onData(ClientXdsClient.java:1515)
        at io.grpc.xds.ClientXdsClient.handleResourcesAccepted(ClientXdsClient.java:1379)
        at io.grpc.xds.ClientXdsClient.handleEdsResponse(ClientXdsClient.java:982)
        at io.grpc.xds.AbstractXdsClient$AbstractAdsStream.handleRpcResponse(AbstractXdsClient.java:504)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1$1.run(AbstractXdsClient.java:663)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:655)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:652)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
        at io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:447)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:652)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:637)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 common frames omitted

I failed to find the exact steps to reproduce this exception. But it almost always happens in my long-running server application.

@dapengzhang0
Copy link
Member

It's likely a bug. Because SharedCallCounterMap uses WeakReference, I believe it's hard to reproduce. Thanks for reporting, I'll have a look.

@ejona86 ejona86 added the bug label Aug 9, 2021
@ejona86
Copy link
Member

ejona86 commented Aug 9, 2021

I expect clusterCounter == null during cleanQueue(). cleanQueue isn't synchronized, so it seems it could maybe remove an entry from counters during getOrCreate(), between the counters.get() and the clusterCounters.put().

@dapengzhang0 dapengzhang0 self-assigned this Aug 10, 2021
@ejona86 ejona86 added this to the 1.41 milestone Aug 10, 2021
@dapengzhang0 dapengzhang0 modified the milestones: 1.41, Next Sep 1, 2021
dapengzhang0 added a commit that referenced this issue Sep 2, 2021
Fixes #8397.
#8397 is caused by mistakenly clearing up a map entry right after the entry is recreated after gc. Reproduced in regression test.
@dapengzhang0
Copy link
Member

But it almost always happens in my long-running server application.

@sky91 The fix will be available in v1.41 release. Let us know if you need the fix to be backported in earlier versions.

dapengzhang0 added a commit to dapengzhang0/grpc-java that referenced this issue Sep 2, 2021
Fixes grpc#8397.
grpc#8397 is caused by mistakenly clearing up a map entry right after the entry is recreated after gc. Reproduced in regression test.
dapengzhang0 added a commit that referenced this issue Sep 2, 2021
Fixes #8397.
#8397 is caused by mistakenly clearing up a map entry right after the entry is recreated after gc. Reproduced in regression test.
@sky91
Copy link
Author

sky91 commented Sep 6, 2021

The fix will be available in v1.41 release. Let us know if you need the fix to be backported in earlier versions.

@dapengzhang0 I will upgrage to v1.41 then. Thanks.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants