You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kgo: do not cancel FindCoordinator if the parent context cancels
Some load testing in Redpanda showed a failure where consuming quit
unexpectedly and unrecoverably.
The sequence of events is:
* if OffsetCommit is issued just before Heartbeat
* and the group needs to be loaded so FindCoordinator is triggered,
* and OffsetCommit happens again, canceling the prior commit's context
Then,
* FindCoordinator would cancel
* Heartbeat, which is waiting on the same load, would fail with
context.Canceled
* This error is seen as a group leave error
* The group management logic would quit entirely.
Now, the context used for FindCoordinator is the client context, which
is only closed on client close. This is also better anyway -- if two
requests are waiting for the same coordinator load, we don't want the
first request canceling to error the second request. If all requests
cancel and we have a stray FindCoordinator in flight, that's ok too,
because well, worst case we'll just eventually have a little bit of
extra data cached that is likely needed in the future anyway.
Closesredpanda-data/redpanda#15131
0 commit comments