Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

priority and ring_hash LBs: fix interactions when using ring_hash under priority #29332

Merged
merged 29 commits into from Apr 15, 2022

Conversation

markdroth
Copy link
Member

@markdroth markdroth commented Apr 6, 2022

Fixes interactions between the priority and ring_hash LB policies, as per the changes in grpc/proposal#296. Also fixes some unrelated bugs found while looking at the code.

Specific changes in priority:

  • Use the same idempotent logic to choose a priority in all cases (e.g., when processing a child connectivity state update and when processing an update from the parent). This avoids problems such as failing to stop using the current priority when it transitions from READY to CONNECTING, but then seeing a duplicate TRANSIENT_FAILURE update from a higher priority that causes us to deselect the current priority because it is in state CONNECTING but does not have a failover timer pending.
  • Keep track of whether a child has seen TRANSIENT_FAILURE more recently than IDLE or READY, and use this to decide whether to restart the failover timer when a child reports CONNECTING. This ensures that we properly start the failover timer when the ring_hash child policy transitions from IDLE to CONNECTING at startup.
  • Do not cancel failover timer when deactivating a child. This avoids problems where a priority is deactivated while connecting and then reactivated before it finishes connecting.

Specific changes in ring_hash:

  • Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
  • Properly start a proactive subchannel connection attempt at subchannel list creation if the subchannels are already in a state that makes that appropriate. Also update the connectivity state visible to the picker at subchannel list creation.
  • Use a ring_hash picker regardless of what connectivity state we report.
  • Treat subchannels that go from READY to TRANSIENT_FAILURE as being in state IDLE, as per the gRFC.
  • Use logical connectivity state (after applying special cases for sticky-TF and IDLE) for both aggregation rules and for the state visible to the picker.

@markdroth markdroth added the release notes: no Indicates if PR should not be in release notes label Apr 6, 2022
@markdroth markdroth requested a review from apolcyn April 6, 2022 17:06
Copy link
Contributor

@apolcyn apolcyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f80fcac looks mostly good

A couple of comments on the test

test/cpp/end2end/xds/xds_end2end_test.cc Show resolved Hide resolved
test/cpp/end2end/xds/xds_end2end_test.cc Show resolved Hide resolved
Copy link
Member Author

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I need to merge #29320 before this, so please review that as well. Thanks!

test/cpp/end2end/xds/xds_end2end_test.cc Show resolved Hide resolved
test/cpp/end2end/xds/xds_end2end_test.cc Show resolved Hide resolved
@grpc-checks grpc-checks bot added bloat/none and removed bloat/low labels Apr 6, 2022
@markdroth
Copy link
Member Author

Looks like the test needs a little more work here. Since merging #29321, the test is no longer failing without the code change here. I'll work on changing it so that it does so.

@markdroth
Copy link
Member Author

I significantly rewrote the test to cover a simpler case but to do it in a way that exerts more control over the timing of the individual subchannel connection attempts. I also simplified the logic in the priority policy so that we always use the same code to choose a priority whenever any child reports a connectivity state change, which should systemically avoid the class of bugs where the logic for choosing a child does not match the logic for reacting to the current child's state changes.

PTAL.

@markdroth
Copy link
Member Author

Eric pointed out that the approach of always starting the failover timer in CONNECTING is not quite right, since that will cause us to select a higher-than-current priority that transitions back to CONNECTING when we have already been using a lower priority that is in state READY. We're still discussing the right way to fix this.

@markdroth markdroth changed the title priority LB: start failover timer when child reports CONNECTING, and don't cancel when deactivating priority LB: don't cancel failover timer when deactivating, and simplify logic Apr 8, 2022
priority:
- go back to starting failover timer upon CONNECTING, but only if seen
  READY or IDLE more recently than TRANSIENT_FAILURE

ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
  CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
  subchannel connecting, just like we do after a second subchannel
  reports TF, to ensure that we don't stay in CONNECTING indefinitely if
  we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
  subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
  creation if needed
@markdroth markdroth merged commit 6273832 into grpc:master Apr 15, 2022
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Apr 15, 2022
YifeiZhuang added a commit to grpc/grpc-java that referenced this pull request Apr 19, 2022
…es (#9084) (#9094)

per gRFC change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
YifeiZhuang added a commit to grpc/grpc-java that referenced this pull request Apr 25, 2022
…es (#9084) (#9107)

part of ring-hash part of the change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
YifeiZhuang added a commit to grpc/grpc-java that referenced this pull request Apr 25, 2022
…es (#9084) (#9108)

part of ring-hash part of the change grpc/grpc#29332:
Apply new aggregation rule: If there is at least one subchannel in state TRANSIENT_FAILURE and there are more than one subchannel, report state CONNECTING. If we hit this rule, proactively start a subchannel connection attempt.
YifeiZhuang added a commit to grpc/grpc-java that referenced this pull request Apr 25, 2022
ringHashPicker changes per gRFC: grpc/grpc#29332:
previously it appears the picker logic is wrong, e.g. not request connecting on the any subchannel if it is in TRANSIENT_FAILURE
Refactored the logic to mirror the pseudo-code more so easier to understand.
YifeiZhuang added a commit to grpc/grpc-java that referenced this pull request Apr 25, 2022
ringHashPicker changes per gRFC: grpc/grpc#29332:
previously it appears the picker logic is wrong, e.g. not request connecting on the any subchannel if it is in TRANSIENT_FAILURE
Refactored the logic to mirror the pseudo-code more so easier to understand.
yashykt pushed a commit to yashykt/grpc that referenced this pull request Jul 8, 2022
…er priority (grpc#29332)

* refactor connection delay injection from client_lb_end2end_test

* fix build

* fix build on older compilers

* clang-format

* buildifier

* a bit of code cleanup

* start failover time whenever the child reports CONNECTING, and don't cancel when deactivating

* clang-format

* rewrite test

* simplify logic in priority policy

* clang-format

* switch to using a bit to indicate child healthiness

* fix reversed comment

* more changes in priority and ring_hash.

priority:
- go back to starting failover timer upon CONNECTING, but only if seen
  READY or IDLE more recently than TRANSIENT_FAILURE

ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
  CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
  subchannel connecting, just like we do after a second subchannel
  reports TF, to ensure that we don't stay in CONNECTING indefinitely if
  we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
  subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
  creation if needed

* ring_hash: fix connectivity state seen by aggregation and picker

* fix obiwan error

* swap the order of ring_hash aggregation rules 3 and 4

* restore original test

* refactor connection injector QueuedAttempt code

* add test showing that ring_hash will continue connecting without picks

* clang-format

* don't actually need seen_failure_since_ready_ anymore

* fix TSAN problem

* address code review comments
yashykt pushed a commit to yashykt/grpc that referenced this pull request Jul 9, 2022
…er priority (grpc#29332)

* refactor connection delay injection from client_lb_end2end_test

* fix build

* fix build on older compilers

* clang-format

* buildifier

* a bit of code cleanup

* start failover time whenever the child reports CONNECTING, and don't cancel when deactivating

* clang-format

* rewrite test

* simplify logic in priority policy

* clang-format

* switch to using a bit to indicate child healthiness

* fix reversed comment

* more changes in priority and ring_hash.

priority:
- go back to starting failover timer upon CONNECTING, but only if seen
  READY or IDLE more recently than TRANSIENT_FAILURE

ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
  CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
  subchannel connecting, just like we do after a second subchannel
  reports TF, to ensure that we don't stay in CONNECTING indefinitely if
  we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
  subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
  creation if needed

* ring_hash: fix connectivity state seen by aggregation and picker

* fix obiwan error

* swap the order of ring_hash aggregation rules 3 and 4

* restore original test

* refactor connection injector QueuedAttempt code

* add test showing that ring_hash will continue connecting without picks

* clang-format

* don't actually need seen_failure_since_ready_ anymore

* fix TSAN problem

* address code review comments
yashykt added a commit that referenced this pull request Jul 11, 2022
…er priority (#29332) (#30253)

* refactor connection delay injection from client_lb_end2end_test

* fix build

* fix build on older compilers

* clang-format

* buildifier

* a bit of code cleanup

* start failover time whenever the child reports CONNECTING, and don't cancel when deactivating

* clang-format

* rewrite test

* simplify logic in priority policy

* clang-format

* switch to using a bit to indicate child healthiness

* fix reversed comment

* more changes in priority and ring_hash.

priority:
- go back to starting failover timer upon CONNECTING, but only if seen
  READY or IDLE more recently than TRANSIENT_FAILURE

ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
  CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
  subchannel connecting, just like we do after a second subchannel
  reports TF, to ensure that we don't stay in CONNECTING indefinitely if
  we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
  subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
  creation if needed

* ring_hash: fix connectivity state seen by aggregation and picker

* fix obiwan error

* swap the order of ring_hash aggregation rules 3 and 4

* restore original test

* refactor connection injector QueuedAttempt code

* add test showing that ring_hash will continue connecting without picks

* clang-format

* don't actually need seen_failure_since_ready_ anymore

* fix TSAN problem

* address code review comments

Co-authored-by: Mark D. Roth <roth@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bloat/none imported Specifies if the PR has been imported to the internal repository lang/c++ lang/core per-call-memory/neutral release notes: no Indicates if PR should not be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants