Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kgo: do not cancel FindCoordinator if the parent context cancels #650

Merged
merged 1 commit into from Dec 21, 2023

Commits on Dec 21, 2023

  1. kgo: do not cancel FindCoordinator if the parent context cancels

    Some load testing in Redpanda showed a failure where consuming quit
    unexpectedly and unrecoverably.
    
    The sequence of events is:
    * if OffsetCommit is issued just before Heartbeat
    * and the group needs to be loaded so FindCoordinator is triggered,
    * and OffsetCommit happens again, canceling the prior commit's context
    Then,
    * FindCoordinator would cancel
    * Heartbeat, which is waiting on the same load, would fail with
      context.Canceled
    * This error is seen as a group leave error
    * The group management logic would quit entirely.
    
    Now, the context used for FindCoordinator is the client context, which
    is only closed on client close. This is also better anyway -- if two
    requests are waiting for the same coordinator load, we don't want the
    first request canceling to error the second request. If all requests
    cancel and we have a stray FindCoordinator in flight, that's ok too,
    because well, worst case we'll just eventually have a little bit of
    extra data cached that is likely needed in the future anyway.
    
    Closes redpanda-data/redpanda#15131
    twmb committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    7d050fc View commit details
    Browse the repository at this point in the history