Proposal for async out-of-band callback support #157

mgravell · 2024-03-27T11:25:02Z

This is beyond features that Redis offers; I have discussed similar with Redis, but it hasn't gone anywhere; maybe Garnet can trailblaze here?

The FIFO nature of Redis request/response means that two scenarios cause severe head-of-line blocking problems:

explicitly blocking operations (BRPOP etc)
anything that stalls the data - checkpoints, slow disk read, etc

Wouldn't it be nice if async completion was possible, WITHIN the existing RESP protocol?

Proposal:

Add a feature that is consistent with the pattern used by the client side caching Redis feature, composed of four parts:

ASYNC {ON|OFF} [REDIRECT conn] [OPTIN] - enables the async feature on a connection, optionally specifying a redirect pub/sub connection (required on RESP2, disallowed on RESP3), and the optional OPTIN that indicates whether this is on/off by default on a command-by-command basis; response: +OK
ASYNC {YES|NO} - (requires ASYNC ON first) optional; explicitly enable/disable async for the next operation (typically used as ASYNC YES to turn async on for an individual call, when OPTIN is enabled) - response: +OK
-ASYNC token new error response from the server when a specific operation has gone out-of-band, where token is the correlation identifier to match this result when available (only used when ASYNC ON is in operation, and the command is electing to use async)
"push" message (on the redirect connection for RESP2, on the current connection for RESP3) that is an array with 3 parts: the category identifier "async", the correlation identifier, and the RESP fragment that is the response for that message; this includes all possible RESP responses including errors, etc

When is a command eligible for async?

ASYNC ON must have been issued, and no subsequent ASYNC OFF
if OPTIN was specified, there must be an ASYNC YES immediately† before the message
there must not be an ASYNC NO immediately† before the message

†=allowing for stacking with other "applies to the next command" commands, for example CLIENT CACHING

Notes:

async responses are optional; if the response is available synchronously, the result is returned instead of -ASYNC token
the correlation id is issued server-side and opaque; there is no planned API that allows a client to do anything with it
the server must guarantee uniqueness of the correlation id in the scope of a single connection (conflicting identifiers on unrelated connections is explicitly permitted)
any clients not prepared for async see this as a redis error that is clear - it won't work, but it also won't smash clients to pieces; this is directly comparable to -MOVED for cluster
clients that are async aware can silently ignore -ERR from ASYNC commands on servers that do not implement it; the next command will simply return synchronously which is exactly what we expect - no change to sync path
tokens do not need to be issued or stored/matched unless we go async: zero impact to sync path
tokens must be compatible with "simple error" strings; opaque integer token makes a lot of sense (concretely: "simple error" and "simple string" values must use printable ASCII excluding \r and \n)
choosing to use ASYNC is inherently also saying "I'm ok with this going out of FIFO"
async does not operate for commands inside MULTI (responses will -QUEUED, including for ASYNC messages) or for WATCH / DISCARD (response will always be synchronous); the server may (but is not required to) allow async completion for EXEC
async does not operate within Lua (responses will be synchronous)
async does not add timeout behaviour; commands that do already have a timeout (brpop etc) would be expected to work as such, though
the server is permitted to apply quotas or other reasons to not use an async response; for example, the server may wish to allow only N outstanding async commands per connection; it could choose to switch to sync only at that point, or could issue -ERR too many async operations
outstanding async operations are considered "doomed" if a connection ends for any reason, and should be discarded by client and server (obviously doing any necessary book-keeping/cleanup)
the ASYNC command will never itself give a -ASYNC response
connection state modifier commands like ASKING, READONLY, READWRITE, SELECT are always synchronous
there is no "order" in async; the ordered responses -ASYNC 123 and -ASYNC 456 may receive async responses in the order 456 then 123

Typical usage:

C: ASYNC ON REDIRECT 12344
S: +OK
C: GET foo
S: "adskjhsdfjkh"
C: GET bar
S: -ASYNC 3434534
C: GET blap
S: "asldkfslkfdgh"

.... (at some point, on subscription connection 12344)
S: ["async" "3434534" "askdjhasdkjsah"]

On RESP3 the REDIRECT 12344 would be omitted, and the server would respond (at some point) via "push" on the same connection.

Alternatively:

C: ASYNC ON REDIRECT 12344
S: +OK
C: ASYNC YES  # next command is allowed (not required) to give async response
S: +OK
C: GET foo
S: "adskjhsdfjkh"
C: ASYNC YES    # next command is allowed (not required) to give async response
S: +OK
C: GET bar
S: -ASYNC 3434534
C: GET blap     # this will never go ASYNC, because no ASYNC YES
S: "asldkfslkfdgh"
C: ASYNC NO
S: +OK
C: GET blap     # this will never go ASYNC, because ASYNC NO
S: "asldkfslkfdgh"

.... (at some point, on subscription connection 12344)
S: ["async" "3434534" "askdjhasdkjsah"]

I am happy to offer the time to get such an idea working with SE.Redis

The text was updated successfully, but these errors were encountered:

badrishc · 2024-03-27T14:14:34Z

Looks like a good idea, particularly for GET from slow disks or cloud storage backends. It can avoid the FIFO head of line blocking. Checkpoints do not block operations in Garnet so that's not a reason for us. We don't yet have BRPOP, but it will help when we do.

I prefer the RESP3 version. Making a second connection per connection is prohibitively inefficient for most scenarios I've seen that might use this. Is there no way to reuse the same connection in RESP2? If not then Garnet could offer the RESP3 semantics just for this.

kevin-montrose · 2024-03-27T14:53:10Z

Piping up to say, this would solve some issues my team is having as well.

When embedding Garnet it is desirable to conditionally go async, rather than always either blocking or forcing operations onto the threadpool. Today we just block, since most operations complete synchronously and the extra trip through the threadpool is murder on performance.

I prefer the RESP3 version. Making a second connection per connection is prohibitively inefficient for most scenarios I've seen that might use this. Is there no way to reuse the same connection in RESP2?

RESP3 introduces the Push type, the equivalent under RESP2 is to use a separate connection. This inefficiency in RESP2 is why the Push type was added 😊.

mgravell · 2024-03-27T15:05:47Z

Technically (and I'll explain why I don't advocate this) we could break specification and issue a RESP3 push response out-of-band on the single connection in RESP2; a suitably educated client would know to expect and handle >3\r\n$5\r\nasync\r\n:12345\r\n{response}. But: my concern isn't about suitably educated clients; it is about "what would most clients expect?". And RESP2 clients will not expect such a message; they only expect out-of-band data on pub/sub connections, and will likely get desynchronized if you did this. RESP2+RESP3 clients might also be a little confused to see "push" data if we haven't completed HELLO 3

However: here's the thing - implementing RESP3 doesn't need to be hard! You do not necessarily need to change every available response to issue RESP3 counterparts; RESP3-aware clients need to handle both RESP2 and RESP3 (because they still want to talk to RESP2 servers), so: even if the only thing you did was implement the HELLO 3 handshake and allow "push" on that connection: that might be enough. Once the connection is RESP3, the server is free to issue "push" messages. You could simply not implement the REDIRECT feature, in that case.

Note that if SE.Redis is in RESP3 mode, it will want to use regular pub/sub on the RESP3 connection, so: any other pub/sub support would need to work on RESP3.

badrishc · 2024-03-29T22:59:54Z

I have one suggestion for the async API proposed above. It should have the notion of a barrier, that will complete (or await) all the earlier-issued asynchronous operations. This is useful for clients to maintain session consistency. For example, they may do a series of reads followed by a write, but might want the pending reads to complete first before issuing the write on that session. This is already available, called CompletePending(...) in FASTER and Tsavorite (Garnet's fork of FASTER).

Example API:

C: GET foo1
S: -ASYNC 3434534
C: GET foo2
S: "asldkfslkfdgh"
C: GET foo3
S: -ASYNC 3434535
C: BARRIER
S: ["async" "3434535" "askdjhasdkjsah_foo3"]
S: ["async" "3434534" "askdjhasdkjsah_foo1"]
S: OK # when we get this OK response to BARRIER, we know all async ops are over

mgravell · 2024-03-29T23:18:46Z

Makes sense, but I would suggest "ASYNC BARRIER" - keeping everything as subcommands minimizes the top-level command explosion.

…

On Fri, 29 Mar 2024, 23:00 Badrish Chandramouli, ***@***.***> wrote: I have one suggestion for the async API proposed above. It should have the notion of a *barrier*, that will *complete* (or *await*) all the earlier-issued asynchronous operations. This is useful for clients to maintain session consistency. For example, they may do a series of reads followed by a write, but might want the pending reads to complete first before issuing the write on that session. This is already available, called CompletePending(...) in FASTER and Tsavorite (Garnet's fork of FASTER). Example API: C: GET foo1 S: -ASYNC 3434534 C: GET foo2 S: "asldkfslkfdgh" C: GET foo3 S: -ASYNC 3434535 C: BARRIER S: ["async" "3434535" "askdjhasdkjsah_foo3"] S: ["async" "3434534" "askdjhasdkjsah_foo1"] S: OK # when we get this OK response to BARRIER, we know all async ops are over — Reply to this email directly, view it on GitHub <#157 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAEHMGFFDHSCNMTTBT67ATY2XW75BFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI2TQNZQG43TQNZUGGSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJWHAYTGNZSHA3TDAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMRRGA2TINJVGQ4IFJDUPFYGLJLMMFRGK3FFOZQWY5LFVI2TQNZQG43TQNZUGGTXI4TJM5TWK4VGMNZGKYLUMU> . You are receiving this email because you authored the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

darrenge added the enhancement New feature or request label Mar 27, 2024

TalZaccai assigned badrishc Mar 29, 2024

badrishc linked a pull request May 15, 2024 that will close this issue

Support async RESP responses to pending GET calls (due to disk read) #387

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for async out-of-band callback support #157

Proposal for async out-of-band callback support #157

mgravell commented Mar 27, 2024 •

edited

badrishc commented Mar 27, 2024

kevin-montrose commented Mar 27, 2024

mgravell commented Mar 27, 2024 •

edited

badrishc commented Mar 29, 2024

mgravell commented Mar 29, 2024 via email

Proposal for async out-of-band callback support #157

Proposal for async out-of-band callback support #157

Comments

mgravell commented Mar 27, 2024 • edited

badrishc commented Mar 27, 2024

kevin-montrose commented Mar 27, 2024

mgravell commented Mar 27, 2024 • edited

badrishc commented Mar 29, 2024

mgravell commented Mar 29, 2024 via email

mgravell commented Mar 27, 2024 •

edited

mgravell commented Mar 27, 2024 •

edited