Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add endpoint to disconnect all users for a given tenant #842

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

filipecabaco
Copy link
Contributor

What kind of change does this PR introduce?

Add endpoint to disconnect all users for a given tenant

Copy link

vercel bot commented Apr 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
realtime-demo ⬜️ Ignored (Inspect) Visit Preview Apr 18, 2024 1:33pm


def disconnect_clients(%{assigns: %{tenant: tenant}} = conn, %{"tenant_id" => tenant_id}) do
Tenants.disconnect_clients(tenant_id)
PostgresCdc.stop_all(tenant, @stop_timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipecabaco this returns either :ok or :error. Shouldn't we include the correct resp code depending on the result of PostgresCdc.stop_all/2? That way whatever consumes the endpoint will know if something went wrong.

Copy link
Member

@w3b6x9 w3b6x9 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels inconsistent when taking into account the other places this is called:

I would think that for a disconnect upstream would want proper feedback that if failed would get 500 in response.

Comment on lines 267 to 268
Tenants.disconnect_clients(tenant_id)
PostgresCdc.stop_all(tenant, @stop_timeout)
Copy link
Member

@w3b6x9 w3b6x9 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipecabaco are there any potential race conditions here?

For example, what would happen if a postgres changes subscribed client gets disconnected and immediately reconnects? This client has a record in realtime.subscriptions again upon reconnection. Then when the call to stop PostgresCdc finally goes through would it wipe away the record? Does calling PostgresCdc.stop_all/2 kick out clients too?

Would it be better to first call PostgresCdc.stop_all/2 and if it returns :ok then call Tenants.disconnect_clients/1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky one as we have no way of doing this without being fully async since we will need to kill potentially thounsands of processes in their own lifecycle.

We can actually suspend the user before hand, wait a couple of seconds (which will already kill socket connections) and then kill postgres cdc

This will prevent new users from joining, kill existing ones and then kill the remainder of the pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably need to:

  • suspend
  • bust tenant cache
  • stop_all (yeah before disconnect)
  • disconnect
  • unsuspend

Note: need to bust the tenant cache tho too because suspend flag is there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you disconnect and they can't reconnect, then yeah you could stop_all after.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like suspend busts the tenant cache already 🥳

@filipecabaco filipecabaco force-pushed the fix/kill-all-tenant-connections branch from a0784a9 to 5cb1201 Compare April 13, 2024 00:12
@filipecabaco filipecabaco marked this pull request as ready for review April 13, 2024 00:12
@filipecabaco filipecabaco force-pushed the fix/kill-all-tenant-connections branch from 5cb1201 to 0d426ee Compare April 13, 2024 00:24
@filipecabaco filipecabaco force-pushed the fix/kill-all-tenant-connections branch from 0d426ee to db43ea7 Compare April 18, 2024 13:30
@@ -104,6 +104,7 @@ defmodule RealtimeWeb.Router do
pipe_through([:open_cors, :tenant_api, :secure_tenant_api])

post("/broadcast", BroadcastController, :broadcast)
delete("/disconnect_clients", TenantController, :disconnect_clients)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have only a /suspend and /unsuspend endpoint where it disconnects clients and kills all the db connections including extensions?

I was having trouble this week figuring out how to restart the realtime_subscription_manager_pub connections.

Having one endpoint where we could confidently disable Realtime for all the db and client conns for a tenant is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really good , will implement that instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants