Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency issues when using REPLICA_DATABASE_URLS #3878

Open
matthewelwell opened this issue May 2, 2024 · 5 comments
Open

Consistency issues when using REPLICA_DATABASE_URLS #3878

matthewelwell opened this issue May 2, 2024 · 5 comments

Comments

@matthewelwell
Copy link
Contributor

When the REPLICA_DATABASE_URLS environment variable is set, and pointing to replicated databases, the application does not behave as one would expect. For example, making a change to the state of a feature will sometimes be reflected immediately, but other times it requires a refresh to display the change to the user. This is likely caused by the fact that the FE immediately requests the state from the API following the toggle. This request comes in before the replication has completed to all replica DBs so it often results in the previous state being returned.

Some options we could consider to resolve this:

  1. The FE should optimistically persist the state of the change rather than refreshing it from the API immediately
  2. Adding a layer of caching on top of the database (?) @khvn26 mentioned that Azure offers something out of the box along these lines, maybe AWS has a similar offering

... further options to be added as they come up

@matthewelwell
Copy link
Contributor Author

matthewelwell commented May 3, 2024

Note that we also saw a couple of (likely) related sentry errors when we had the REPLICA_DATABASE_URLS environment variable set in our production SaaS environment:

https://flagsmith.sentry.io/issues/5294385824/
https://flagsmith.sentry.io/issues/5294412984

@zachaysan
Copy link
Contributor

Over a video chat with @matthewelwell, @novakzaballa, and @khvn26 I briefly discussed one potential way of solving our replica woes. Instead of routing every model through PrimaryReplicaRouter we could take a hybrid approach where we allowlist specific models that are safe for spreading the load across the PrimaryReplicaRouter but for other models we can default to serving the primary database. When we have a model that we sometimes want to be off of the reader but other times we don't we should use the Django using() method as outlined here. It would be good if we can find a way of calling the using() method that could spread out the load across the replicas automatically. But either way, this overall approach will allow us to mark models and querysets into being replica safe without having us solve the entire complexity of everything automatically going to the replicas.

@zachaysan
Copy link
Contributor

Thinking about it more, we should be able to create a function or a property that can follow the ReplicaReadStrategy and make use of the connection_check function that can be passed into the using() method. Something we'd call like this:

some_fancy_queryset.using(get_db_replica)

To automatically pick the right replica.

@matthewelwell
Copy link
Contributor Author

@joestadler
Copy link

Seeing similar behavior on 2.109 with REPLICA_DATABASE_URLS. If I try to create a segment override, the application hangs on the creation and gives a 500 error on the create-segment-override call in the network logs . On refresh, the segment override exists but without the value that was set at creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants