Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressively retry Redis commands #193

Open
casperisfine opened this issue Oct 31, 2022 · 0 comments
Open

More aggressively retry Redis commands #193

casperisfine opened this issue Oct 31, 2022 · 0 comments

Comments

@casperisfine
Copy link
Contributor

casperisfine commented Oct 31, 2022

Acceptance Criteria

  • We need to go over all the commands we emit and make sure they are idempotent, otherwise retrying could result in a corrupted state, lost tests etc.
  • The redis gem has the necessary elements for that it's mostly just configuration.

Context

Sometimes our Redis server that handle the ci-queue workload experience a failover or some other availability issues.

When this happens it break builds even though it recovers pretty fast.

Examples

Error connecting to Redis on redacted.svc.cluster.local.:6379 (SocketError) (Redis::CannotConnectError)
./tmp/bundle/ruby/3.1.0/gems/redis-4.8.0/lib/redis/client.rb:162:in `call': MASTERDOWN Link with MASTER is down and replica-serve-stale-data is set to 'no'. (Redis::CommandError)

(that later one need to be better categorized by the redis gem though)

Solution

Ideally we'd be resilient to these small transient errors, this means retrying all or most commands and possibly waiting a bit before retrying. The redis gem has the necessary elements for that it's mostly just configuration.

However we need to go over all the commands we emit and make sure they are idempotent, otherwise retrying could result in a corrupted state, lost tests etc.

cc @ChrisBr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant