New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: Fix WeakReference bug in SharedCallCounterMap #8466
Conversation
@@ -73,6 +73,9 @@ void cleanQueue() { | |||
CounterReference ref; | |||
while ((ref = (CounterReference) refQueue.poll()) != null) { | |||
Map<String, CounterReference> clusterCounter = counters.get(ref.cluster); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing clusterCounter
shouldn't be null, because refs should be enqueued in the same order as the order of underlying referents being nullified by garbage collector. But I did not see javadoc explicitly say that.
Is there any risk of NPE in extreme race case like the following?
ref1.referent nullified by gc => ref2 created and put in the counters map => ref2.referent nullified by gc => ref2 enqueued => ref1 enqueued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With C1 it doesn't seem too far-fetched, especially if enqueuing is a separate stage of the process from clearing. I doubt it would actually happen, but it seems fair to consider.
A simple solution for that is to call ref.enqueue()
if ref.get() == null
, before replacing the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. TIL thanks.
public void gcAndRecreate() { | ||
@SuppressWarnings("UnusedVariable") // assign to null for GC only | ||
AtomicLong counter = map.getOrCreate(CLUSTER, EDS_SERVICE_NAME); | ||
final CounterReference ref = counters.get(CLUSTER).get(EDS_SERVICE_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, you can call ref.clear()
and ref.enqueue()
manually instead of relying on GC.
@@ -73,6 +73,9 @@ void cleanQueue() { | |||
CounterReference ref; | |||
while ((ref = (CounterReference) refQueue.poll()) != null) { | |||
Map<String, CounterReference> clusterCounter = counters.get(ref.cluster); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With C1 it doesn't seem too far-fetched, especially if enqueuing is a separate stage of the process from clearing. I doubt it would actually happen, but it seems fair to consider.
A simple solution for that is to call ref.enqueue()
if ref.get() == null
, before replacing the reference.
Fixes #8397.
#8397 is caused by mistakenly clearing up a map entry right after the entry is recreated after gc. Reproduced in regression test.
(
SharedCallCounterMap
is hard to read and can easily be a source of bugs. It's impossible to sufficiently test the class with unit test because GC can happen anytime concurrently with the method being tested. I'm not 100% confident about the correctness of the fix. If possible I would avoid using WeakReference in the first place.)