Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a subscription with Event Store offline at first hangs for a while and causes high CPU usage #140

Open
kerams opened this issue Jul 21, 2021 · 4 comments

Comments

@kerams
Copy link

kerams commented Jul 21, 2021

Describe the bug
The docs don't describe resubscribing after dropped subscriptions in detail, so I came up with my own solution (for simplicity's sake I'm glossing over checkpoints here):

static async Task Subscribe(EventStoreClient c, string stream)
{
    while (true)
    {
        try
        {
            await c.SubscribeToStreamAsync(stream, (sub, e, ct) =>
                {
                    Console.WriteLine(e.Event.EventId);
                    return Task.CompletedTask;
                },
                subscriptionDropped: async (sub, reason, ex) =>
                {
                    if (reason != SubscriptionDroppedReason.Disposed)
                        await Subscribe(c, stream);
                }
            );

            return;
        }
        catch
        {
            await Task.Delay(5000);
        }
    }
}

static async Task Main(string[] args)
{
    var c = new EventStoreClient(EventStoreClientSettings.Create("esdb://localhost:2113?tls=false"));
    await Subscribe(c, "newstream");
    Thread.Sleep(-1);
}

This works very well when I start the application and restart Event Store--resubscriptions and replays are almost instant. The problem appears when I start the application while Event Store is offline. When I bring it online, the call to SubscribeToStreamAsync hangs for around 2 minutes, maxing out a CPU core and, according to VS, allocating 1.5GB over that period (it must get collected quickly because memory used remains stable), but it does succeed in the end. Resubscribing after subsequent Event Store restarts is immediate again.

To Reproduce
Steps to reproduce the behavior:

  1. Append some events to newstream
  2. Shut down Event Store
  3. Run the code above and wait for a couple of seconds
  4. Start Event Store
  5. Wait until event IDs are printed while observing the process's CPU usage in Task Manager.

Expected behavior
SubscribeToStreamAsync yields quickly when the application is started with Event Store offline and then launched.

Actual behavior
SubscribeToStreamAsync takes a couple of minutes and consumes resources when the application is started with Event Store offline and then launched.

EventStore details

  • EventStore server version: eventstore/eventstore:21.2.0-buster-slim image
  • Operating system: Win 10, Ubuntu 20.04.1 via WSL2
  • EventStore client version (if applicable): EventStore.Client.Grpc.Streams 21.2.0
  • .NET runtime 5.0.7
@james-allan-lloyd
Copy link

I observe the same behavior, albeit with the eventstore/eventstore:21.10.1-buster-slim image.

@hayley-jean
Copy link
Member

hayley-jean commented Oct 4, 2022

Hi @kerams and @james-allan-lloyd thanks for the detailed report, I can reproduce this on EventStoreDB 22.6.0, and the master branch for the gRPC client.

However, I'm only seeing this issue when running EventStoreDB in insecure mode in docker.
I don't see it when running EventStoreDB locally, or when I run EventStoreDB in a secure docker container.

Can you confirm this as well?

In the logs, I always see a deadline exceeded error before the client takes a long time to resubscribe, which leads me to believe that there's a timeout that's not handled correctly internally in the client.

@kerams
Copy link
Author

kerams commented Oct 4, 2022

It was indeed insecure in a container. Unfortunately, I do not have the opportunity to try other scenarios right now.

@Zyklop
Copy link

Zyklop commented Dec 14, 2022

I was analyzing this issue too. I did some profiling and I think the problem seems to be in this method, according to the profiler:

private async Task FillBoxAsync(TaskCompletionSource<TOutput> box, TInput input) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants