Consistency issues when using REPLICA_DATABASE_URLS #3878

matthewelwell · 2024-05-02T21:44:54Z

When the REPLICA_DATABASE_URLS environment variable is set, and pointing to replicated databases, the application does not behave as one would expect. For example, making a change to the state of a feature will sometimes be reflected immediately, but other times it requires a refresh to display the change to the user. This is likely caused by the fact that the FE immediately requests the state from the API following the toggle. This request comes in before the replication has completed to all replica DBs so it often results in the previous state being returned.

Some options we could consider to resolve this:

The FE should optimistically persist the state of the change rather than refreshing it from the API immediately
Adding a layer of caching on top of the database (?) @khvn26 mentioned that Azure offers something out of the box along these lines, maybe AWS has a similar offering

... further options to be added as they come up

The text was updated successfully, but these errors were encountered:

matthewelwell · 2024-05-03T08:28:01Z

Note that we also saw a couple of (likely) related sentry errors when we had the REPLICA_DATABASE_URLS environment variable set in our production SaaS environment:

https://flagsmith.sentry.io/issues/5294385824/
https://flagsmith.sentry.io/issues/5294412984

zachaysan · 2024-05-03T13:53:29Z

Over a video chat with @matthewelwell, @novakzaballa, and @khvn26 I briefly discussed one potential way of solving our replica woes. Instead of routing every model through PrimaryReplicaRouter we could take a hybrid approach where we allowlist specific models that are safe for spreading the load across the PrimaryReplicaRouter but for other models we can default to serving the primary database. When we have a model that we sometimes want to be off of the reader but other times we don't we should use the Django using() method as outlined here. It would be good if we can find a way of calling the using() method that could spread out the load across the replicas automatically. But either way, this overall approach will allow us to mark models and querysets into being replica safe without having us solve the entire complexity of everything automatically going to the replicas.

zachaysan · 2024-05-03T14:47:22Z

Thinking about it more, we should be able to create a function or a property that can follow the ReplicaReadStrategy and make use of the connection_check function that can be passed into the using() method. Something we'd call like this:

some_fancy_queryset.using(get_db_replica)

To automatically pick the right replica.

matthewelwell · 2024-05-03T16:22:41Z

Relevant reading: https://andrewbrookins.com/python/scaling-django-with-postgres-read-replicas/

joestadler · 2024-05-09T16:15:31Z

Seeing similar behavior on 2.109 with REPLICA_DATABASE_URLS. If I try to create a segment override, the application hangs on the creation and gives a 500 error on the create-segment-override call in the network logs . On refresh, the segment override exists but without the value that was set at creation.

matthewelwell mentioned this issue May 3, 2024

infra: add REPLICA_DATABASE_URLS to ECS task definition #3877

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistency issues when using REPLICA_DATABASE_URLS #3878

Consistency issues when using REPLICA_DATABASE_URLS #3878

matthewelwell commented May 2, 2024

matthewelwell commented May 3, 2024 •

edited

zachaysan commented May 3, 2024

zachaysan commented May 3, 2024

matthewelwell commented May 3, 2024

joestadler commented May 9, 2024

Consistency issues when using REPLICA_DATABASE_URLS #3878

Consistency issues when using REPLICA_DATABASE_URLS #3878

Comments

matthewelwell commented May 2, 2024

matthewelwell commented May 3, 2024 • edited

zachaysan commented May 3, 2024

zachaysan commented May 3, 2024

matthewelwell commented May 3, 2024

joestadler commented May 9, 2024

matthewelwell commented May 3, 2024 •

edited