Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xDS ADS reconnection should not spam logs #8886

Open
ejona86 opened this issue Feb 3, 2022 · 2 comments
Open

xDS ADS reconnection should not spam logs #8886

ejona86 opened this issue Feb 3, 2022 · 2 comments
Labels
Milestone

Comments

@ejona86
Copy link
Member

ejona86 commented Feb 3, 2022

As recently seen:

WARNING: [Channel<11>: (xds:///<service>)] Failed to resolve name. status=Status{​​code=UNAVAILABLE, description=Closed by server, cause=null}

This is being logged from ManagedChannelImpl:

private void handleErrorInSyncContext(Status error) {
logger.log(Level.WARNING, "[{0}] Failed to resolve name. status={1}",
new Object[] {getLogId(), error});

I'm surprised that log statement exists as a WARNING. Seems it was introduced in the large ManagedChannelImpl2 rewrite (#2530).

There's two cases this WARNING could happen:

  1. Before resolution has ever succeeded. There's no need for logging because the error will be communicated to RPCs. This is unlikely the case here just because "Closed by server" is probably the cycling of long-lived RPCs for load balancing
  2. After initial resolution. This might deserve logs at some points because RPCs probably won't fail (gRPC will continue using previous data). In this case, xds client should probably squelch this "error" as it is quite normal. There could be some debate here as to whether the watchers would be notified, or whether it matters if they are notified, but overall this is quite ordinary. Any error here should be logged if re-creating the ADS stream fails (which is probably dependent on whether the stream received responses).

So that means we should make two changes (one to ManagedChannelImpl, one to xds).

I'd hope we'd never log WARNINGs like this, but it is a bit hard of a situation and I don't want us to let perfect be the enemy of good. At least for xDS, we should be able to avoid logging except when ADS stream creation fails.

I've seen #8773, but it looks incomplete, as it'd need to handle the "never received a response" case.

CC @erikjoh, @dapengzhang0, @YifeiZhuang, @sergiitk

@ejona86 ejona86 added the bug label Feb 3, 2022
@ejona86 ejona86 added this to the v1.45 milestone Feb 3, 2022
@ejona86
Copy link
Member Author

ejona86 commented Feb 23, 2022

Google-internal copy of issue: b/207626836

@ejona86
Copy link
Member Author

ejona86 commented Feb 23, 2022

Comment from dapengzhang0 on b/207626836:

On ADS stream closed by the control plane, regardless of Status.isOk(), for each resource type, if a valid resource was ever received previously, don't do anything for the ResourceWatcher of that resource type; if a valid resource was never received, call ResourceWatcher.onError() for that resource type.

ejona86 added a commit to ejona86/grpc-java that referenced this issue Feb 23, 2022
Workaround for grpc#8886, as we wait on a real fix. The regular load
balancing disconnections are confusing users and will train users to
start ignoring gRPC warnings. At present, it is better to have no log
than excessively log.
ejona86 added a commit that referenced this issue Feb 23, 2022
Workaround for #8886, as we wait on a real fix. The regular load
balancing disconnections are confusing users and will train users to
start ignoring gRPC warnings. At present, it is better to have no log
than excessively log.
ejona86 added a commit to ejona86/grpc-java that referenced this issue Feb 23, 2022
Workaround for grpc#8886, as we wait on a real fix. The regular load
balancing disconnections are confusing users and will train users to
start ignoring gRPC warnings. At present, it is better to have no log
than excessively log.
ejona86 added a commit that referenced this issue Feb 24, 2022
Workaround for #8886, as we wait on a real fix. The regular load
balancing disconnections are confusing users and will train users to
start ignoring gRPC warnings. At present, it is better to have no log
than excessively log.
@ejona86 ejona86 modified the milestones: v1.45, v1.46 Mar 7, 2022
@ejona86 ejona86 modified the milestones: v1.46, v1.47 Apr 11, 2022
@ejona86 ejona86 modified the milestones: v1.47, Next May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant