New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default reconnect strategy uses exponential backoff and jitter #2736
base: v5
Are you sure you want to change the base?
Conversation
Both are recommended parts of client reconnect strategies to prevent thundering herd problems when many clients lose their connection at once (for example, during a Redis upgrade).
Hi @qsymmachus, thanks for contributing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just a few small comments and questions. You can ignore the nits and I'll do them myself.
docs/client-configuration.md
Outdated
@@ -13,7 +13,7 @@ | |||
| socket.keepAlive | `true` | Toggle [`keep-alive`](https://nodejs.org/api/net.html#socketsetkeepaliveenable-initialdelay) functionality | | |||
| socket.keepAliveInitialDelay | `5000` | If set to a positive number, it sets the initial delay before the first keepalive probe is sent on an idle socket | | |||
| socket.tls | | See explanation and examples [below](#TLS) | | |||
| socket.reconnectStrategy | `retries => Math.min(retries * 50, 500)` | A function containing the [Reconnect Strategy](#reconnect-strategy) logic | | |||
| socket.reconnectStrategy | `((retries^2) * 50 ms) + 0-200 ms` | A function containing the [Reconnect Strategy](#reconnect-strategy) logic | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ^
operator is "Bitwise XOR" in JavaScript, not power.. we need to find a better way to document this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with a plain English explanation of the strategy in 38eb603, rather than try to figure out how to pseudo-code this in a legible way.
reconnectStrategy: retries => Math.min(retries * 50, 1000) | ||
reconnectStrategy: retries => { | ||
// Generate a random jitter between 0 – 200 ms: | ||
const jitter = Math.floor(Math.random() * 200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an option to pass { jitter: { min: number; max: number; }; delay: number; }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me that sounds like optimizing how clients can define their own custom strategy. But I see value in providing boilerplate to allow people to quickly define custom backoff + jitter strategies. Could we scope this PR to just the default strategy update, and then introduce this idea in a separate one?
packages/client/lib/client/socket.ts
Outdated
@@ -82,6 +82,15 @@ export default class RedisSocket extends EventEmitter { | |||
} | |||
|
|||
#createReconnectStrategy(options?: RedisSocketOptions): ReconnectStrategyFunction { | |||
const defaultStrategy = (retries: number) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to a #defaultReconnectStrategy
function and used from lines 109 and 114
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in e7304ad 👍
@leibale I responded to your latest feedback; sorry for the delay! |
Description
The current default reconnect strategy increases delay linearly with no jitter. This can lead to a thundering herd problem if many clients lose their connection at once, which is common during events like a Redis upgrade.
A reconnect strategy that uses exponential backoff plus a random jitter helps mitigate this risk and is recommended in many best practices documents for Redis clients, so I propose making this the default.
The default strategy I implemented will do the following:
(n^2) * 50 ms
wheren
is the current retry count, with a maximum value of 2000 ms.The millisecond values I choose are somewhat arbitrary, and I'm happy to change them if we think they're not right.
Checklist
npm test
pass with this change (including linting)?reconnectStrategy
still pass. We have no tests for the default strategy, nor am I sure if and how we'd want to test it. Any thoughts on this?