Sync cache between multiple instances #130

amintalebi · 2020-09-21T06:02:50Z

Hey
I have an application that runs with multiple instances in a cloud environment. Cause It stores very little amounts of data and It doesn't need to be persistent, I do not find it necessary to run a separate database for it. I wanted to use go cache as a built-in datastore for my application but It doesn't support synchronization between multiple instances.

danicml · 2020-12-17T22:48:27Z

It seems to me that you are looking for a distributed cache. Go cache is a local cache, they are different use cases. To solve your problem I recommend you see memcached or redis.

sebstyle · 2021-03-28T15:53:33Z

Something to think about..
If your application services in the form of an http api behind an nginx proxy you could setup a location in nginx to manipulate keys on all your instances..
To keep the example short I left out the usual proxy pass commands like X-Forwarded-For
Read the caveats at the end because, you know; the warnings, the warnings come after the spells :)

# Define upstreams
# You could include the instance running locally marked as backup so it only gets used when no other upstream is available
upstream instance_pool {
   server 127.0.0.1:8080 backup;
   server 10.0.0.1:8080;
   server 10.0.0.2:8080;
}
upstream instance_local {
    server 127.0.0.1:8080;
}
upstream instance1 {
    server 10.0.0.1:8080;
}
upstream instance2 {
    server 10.0.0.2:8080;
}
# Setup broadcast path
location ~ (cache/tokensX|cache/tokensY) {
  mirror /mirror/instance1;
  mirror /mirror/instance2;
  # You could drop the request body to mirrors if you do not need it
  #mirror_request_body off;
  # Endpoint is cache related; header X-Mirrored-By is required (origin)
  proxy_set_header X-Mirrored-By origin;
  # You could run an instance of your app on the proxy and pass the initial request to that instance
  proxy_pass http://instance_local$request_uri;
  # If you do not want to run an instance of your app on the proxy you could forward to the pool
  proxy_pass http://instance_pool$request_uri;
  # But that would result in one pool member receiving the request twice (original request + mirrored request)
  # Perhaps this can be tweaked by setting priority on a certain server in the pool when the request was made to the cache location
}
location /mirror/instance1 {
  # Marked internal so not available to public
  internal;
  # Cache related; header X-Mirrored-By is required (relayed)
  proxy_set_header X-Mirrored-By relayed;
  # You could drop the request body to instance if you do not need it
  #proxy_pass_request_body off;
  #proxy_set_header Content-Length "";
  proxy_pass http://instance1$request_uri;
}
location /mirror/instance2 {
  # Marked internal so not available to public
  internal;
  # Cache related; header X-Mirrored-By is required (relayed)
  proxy_set_header X-Mirrored-By relayed;
  # You could drop the request body to instance if you do not need it
  #proxy_pass_request_body off;
  #proxy_set_header Content-Length "";
  proxy_pass http://instance1$request_uri;
}

Now you can make an api request: PUT https://yourappproxy/cache/tokensX/newkey with payload newvalue
This request then gets passed to the local instance and mirrored to all your other instances.
All your instances now have newkey with newvalue in their tokensX cache.
Instances that are mirrored but are offline will result in an error log entry warning that the mirror was unreachable.
Do keep in mind that trying to mirror to a nonexistent instance will likely delay the entire original request untill the mirrored request times out

Should your app contain code that would trigger forwarding a certain cache operation to your other instances directly you can use the X-Mirrored-By header to check if that operation was an original request (X-Mirrored-By = origin) or mirrored request (X-Mirrored-By = relayed)
So before your code forwards to other instances bypassing the proxy you should check if it was relayed or needs relaying to prevent avalanches.

Caveats

The original request to the cache location will wait for the mirrors response
The request to mirrors will return as fast as timeout allows or your slowest instance.
So perhaps your api server code should prioritize handling cache requests.
Restrict access to the cache location so only addresses running instances of your app can access it.
This implementation works best when used with cached data that rarely changes because requests to the mirrored cache endpoint are expensive due to first mentioned caveat
For example; cache an entire token -> account id lookup table with fairly static data from database to tokensX cache when your app starts up.
When every so often an account get terminated send a DELETE https://yourappproxy/cache/tokensX/ to remove that cached token from all your instances

sebstyle · 2021-03-28T18:42:39Z

Some numbers benching this:
Don't take them too seriously as these benchmarks were performed on a single machine running all the services (proxy / multiple app instances) and the benchmark tool itself..
Requests that do not access any cached data

Summary:
  Total:	10.0007 secs
  Slowest:	0.0054 secs
  Fastest:	0.0002 secs
  Average:	0.0004 secs
  Requests/sec:	985.9357

Requests that do use cached data

Summary:
  Total:	10.0017 secs
  Slowest:	0.0069 secs
  Fastest:	0.0002 secs
  Average:	0.0004 secs
  Requests/sec:	987.6336

Requests that trigger cache update on all instances

Summary:
  Total:	10.0025 secs
  Slowest:	0.0092 secs
  Fastest:	0.0005 secs
  Average:	0.0011 secs
  Requests/sec:	841.8916

Requests that trigger cache update on all instances where one instance (mirror) is offline

Summary:
  Total:	10.0010 secs
  Slowest:	0.0066 secs
  Fastest:	0.0005 secs
  Average:	0.0009 secs
  Requests/sec:	559.7423

One offline/slow mirror causes a significant delay in returning the request that was made to the cache path.
As mentioned in the caveats, the cache mirror path is not intended for large amounts of cache operations that this benchmark generated.
Tested using Hey tool https://github.com/rakyll/hey with params:
-z 10s -q 1000 -n 100000 -c 1 -t 1 -m POST

/ edit

For tests database vs cache
hey params:
-z 10s -q 1000 -n 100000 -c 10 -t 1 -m POST
Fetching the data from database

Summary:
  Total:	10.0077 secs
  Slowest:	0.0251 secs
  Fastest:	0.0007 secs
  Average:	0.0039 secs
  Requests/sec:	2585.1126

Fetching the data from cache

Summary:
  Total:	10.0015 secs
  Slowest:	0.0291 secs
  Fastest:	0.0002 secs
  Average:	0.0021 secs
  Requests/sec:	4725.9928

Fetching the data from cache while manipulating cache on all instances

Summary:
  Total:	10.0048 secs
  Slowest:	0.0406 secs
  Fastest:	0.0005 secs
  Average:	0.0070 secs
  Requests/sec:	1423.7224

The requests for manipulating the cache in this test were send to https://yourappproxy/cache/tokenX/ involving dns and ssl.
There are tweaks that can be applied to increase rps for manipulating cache.

If the proxy and app instances are (listening) on a private network you could use http://10.0.0.100/cache/tokenX/ and avoid the extra overhead that comes with making dns lookups and https requests..
Disable some additional overhead that comes with http requests like keepalive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync cache between multiple instances #130

Sync cache between multiple instances #130

amintalebi commented Sep 21, 2020

danicml commented Dec 17, 2020

sebstyle commented Mar 28, 2021 •

edited

sebstyle commented Mar 28, 2021 •

edited

Sync cache between multiple instances #130

Sync cache between multiple instances #130

Comments

amintalebi commented Sep 21, 2020

danicml commented Dec 17, 2020

sebstyle commented Mar 28, 2021 • edited

sebstyle commented Mar 28, 2021 • edited

sebstyle commented Mar 28, 2021 •

edited

sebstyle commented Mar 28, 2021 •

edited