New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A42 update: fix ring_hash connectivity state aggregation rules #296
Merged
Merged
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8ea1977
A42 update: fix ring_hash connectivity state aggregation rules
markdroth 52895a0
clarify that ring_hash usually starts in IDLE
markdroth d2efa4a
clarify behavior change in priority policy
markdroth 9e8ee0d
further clarify the priority change
markdroth 7bc65fa
clarify wording
markdroth 183057b
swap the order of aggregation rules 3 and 4
markdroth 8265ea3
clarify wording
markdroth File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate, that there can be addresses which ring hash will never attempt?
I might be wrong, but I thought that if every connection attempt failed, that ring hash will eventually circle around and try every address?
I'm asking this because, if ring hash will eventually attempt every address, then considering the new priority changes to properly handle the fail over timeout with ring hash, I'm wondering if this heuristic to enter TRANSIENT_FAILURE after encountering two connection failures is still worth having - i.e. what problem would it solve that isn't solved by priority fail over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, it will eventually try all of the subchannels. However, this change is still needed for two reasons:
priority
policy uses an idempotent algorithm for choosing a priority that is triggered every time a child's state changes, and that algorithm treatsIDLE
the same asREADY
-- i.e., the state of the failover timer matters only when the child is reportingCONNECTING
, so ifring_hash
continues to reportIDLE
after the failover timer fires, we'll wind up re-selecting it as soon as the newly created next priority reportsCONNECTING
, which is not what we want. (To say this another way, the priority policy assumes that a policy inIDLE
will transition toCONNECTING
as soon as it gets a pick.)ring_hash
policy is designed to not be xDS-specific; eventually, we'd like to be able to use it as a top-level policy even without xDS. At that point, it would still be necessary to have the same hueristic here so that the channel will properly go intoTRANSIENT_FAILURE
when things are not working, rather than staying inIDLE
indefinitely. If things are not working, we want non-wait_for_ready RPCs to fail quickly.I've updated the wording here to try to clarify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying over context from chat FTR:
When
ring_hash
is underpriority
, it could stay in connecting for however long it took until reaching TD, and priority would do the right thing.However, when ring_hash is a top-level policy, we need to balance the goals of failing fast in some cases and using all failover addresses in others.
My only thought about waiting until every subchannel enters TF is that it may be easier for users to understand - that time taken for ring hash to reach TF is proportional to ring hash's address list size (i.e. we remove the need to understand this heuristic). But OTOH, taking forever to reach TF introduces its own problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we really can't wait until every single subchannel has been attempted before going TF, since that would basically turn every RPC into a wait_for_ready RPC, which is definitely not what we want.
This is a heuristic, and like any heuristic, it's an imperfect attempt to balance between competing objectives. No matter what value we choose, someone can argue that a different value would be better in their case, but we can't choose the value they want without making it worse in some other case.
At the end of the day, I think two is the right number here. We definitely don't want to do just one, because having one individual backend unreachable is probably not uncommon, and we don't want to fail all RPCs in that case. But it's fairly unlikely that the first two backends that we happen to try are both independently down at the same time without there being some broader reachability issue that will also affect any other backend we might try. Increasing the number to three might give us slightly more confidence that there is a broader reachability issue that will also affect any other backend, but that slight boost in confidence doesn't seem like enough to justify the additional delay in failing non-wait_for_ready RPCs. And the trade-off gets dramatically worse as you go up from three: the increased confidence becomes smaller and smaller, and the delay before failing non-wait_for_ready RPCs becomes larger and larger. So I think two is the right sweet-spot here.