Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: change ring_hash LB aggregation rule to handles transient_failures #9084

Merged
merged 7 commits into from Apr 18, 2022

Conversation

YifeiZhuang
Copy link
Contributor

part of ring-hash part of the change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.

if (!connectionAttemptIterator.hasNext()) {
connectionAttemptIterator = subchannels.values().iterator();
}
if (connectionAttemptIterator.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an else clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we do this regardless of resetting the iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have this condition at all? The only way it would produce results is if subchannels is empty. But it seems things will be pretty broken if that's the case. random.nextInt(0) throws, for example. Are we doing this because we aren't confident the state of things when it is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not relying on the assumption that it is protected by syncContext is one reason.
The other is kinda best practice to always call hasNext() before next. I guess it does not hurt unless that it might hide some unexpected bugs.

@ejona86
Copy link
Member

ejona86 commented Apr 14, 2022

Want to have a test that verifies (unsure it exists, but want to verify it exists or add it) sticky-TF and requestConnection() cooperate:

  1. Pass multiple addresses to policy (creates multiple subchannels with IDLE)
  2. Force one subchannel to TF
    a. Assert the policy is now CONNECTING
  3. Force that same subchannel to IDLE
    a. Assert this caused a request connection on any one of the subchannels

xds/src/main/java/io/grpc/xds/RingHashLoadBalancer.java Outdated Show resolved Hide resolved
xds/src/main/java/io/grpc/xds/RingHashLoadBalancer.java Outdated Show resolved Hide resolved
xds/src/main/java/io/grpc/xds/RingHashLoadBalancer.java Outdated Show resolved Hide resolved
if (!connectionAttemptIterator.hasNext()) {
connectionAttemptIterator = subchannels.values().iterator();
}
if (connectionAttemptIterator.hasNext()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have this condition at all? The only way it would produce results is if subchannels is empty. But it seems things will be pretty broken if that's the case. random.nextInt(0) throws, for example. Are we doing this because we aren't confident the state of things when it is called?

@YifeiZhuang YifeiZhuang merged commit a0da558 into grpc:master Apr 18, 2022
@YifeiZhuang YifeiZhuang deleted the ring-hash branch April 18, 2022 03:45
@ejona86 ejona86 added the TODO:backport PR needs to be backported. Removed after backport complete label Apr 18, 2022
YifeiZhuang added a commit to YifeiZhuang/grpc-java that referenced this pull request Apr 19, 2022
YifeiZhuang added a commit that referenced this pull request Apr 19, 2022
…es (#9084) (#9094)

per gRFC change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
@YifeiZhuang YifeiZhuang removed the TODO:backport PR needs to be backported. Removed after backport complete label Apr 20, 2022
YifeiZhuang added a commit to YifeiZhuang/grpc-java that referenced this pull request Apr 25, 2022
YifeiZhuang added a commit to YifeiZhuang/grpc-java that referenced this pull request Apr 25, 2022
YifeiZhuang added a commit that referenced this pull request Apr 25, 2022
…es (#9084) (#9107)

part of ring-hash part of the change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
YifeiZhuang added a commit that referenced this pull request Apr 25, 2022
…es (#9084) (#9108)

part of ring-hash part of the change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants