Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Fix WeakReference bug in SharedCallCounterMap #8466

Merged
merged 6 commits into from Sep 2, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions xds/src/main/java/io/grpc/xds/SharedCallCounterMap.java
Expand Up @@ -73,6 +73,9 @@ void cleanQueue() {
CounterReference ref;
while ((ref = (CounterReference) refQueue.poll()) != null) {
Map<String, CounterReference> clusterCounter = counters.get(ref.cluster);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing clusterCounter shouldn't be null, because refs should be enqueued in the same order as the order of underlying referents being nullified by garbage collector. But I did not see javadoc explicitly say that.

Is there any risk of NPE in extreme race case like the following?
ref1.referent nullified by gc => ref2 created and put in the counters map => ref2.referent nullified by gc => ref2 enqueued => ref1 enqueued.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With C1 it doesn't seem too far-fetched, especially if enqueuing is a separate stage of the process from clearing. I doubt it would actually happen, but it seems fair to consider.

A simple solution for that is to call ref.enqueue() if ref.get() == null, before replacing the reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. TIL thanks.

if (clusterCounter.get(ref.edsServiceName) != ref) {
continue;
}
clusterCounter.remove(ref.edsServiceName);
if (clusterCounter.isEmpty()) {
counters.remove(ref.cluster);
Expand Down
18 changes: 18 additions & 0 deletions xds/src/test/java/io/grpc/xds/SharedCallCounterMapTest.java
Expand Up @@ -62,4 +62,22 @@ public boolean isDone() {
map.cleanQueue();
assertThat(counters).isEmpty();
}

@Test
public void gcAndRecreate() {
@SuppressWarnings("UnusedVariable") // assign to null for GC only
AtomicLong counter = map.getOrCreate(CLUSTER, EDS_SERVICE_NAME);
final CounterReference ref = counters.get(CLUSTER).get(EDS_SERVICE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, you can call ref.clear() and ref.enqueue() manually instead of relying on GC.

assertThat(counter.get()).isEqualTo(0);
counter = null;
GcFinalization.awaitDone(new FinalizationPredicate() {
@Override
public boolean isDone() {
return ref.isEnqueued();
}
});
map.getOrCreate(CLUSTER, EDS_SERVICE_NAME);
assertThat(counters.get(CLUSTER)).isNotNull();
assertThat(counters.get(CLUSTER).get(EDS_SERVICE_NAME)).isNotNull();
}
}