Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google.Cloud.Firestore Listener instances limited to 100 #12318

Open
Skinz3 opened this issue Mar 26, 2024 · 22 comments
Open

Google.Cloud.Firestore Listener instances limited to 100 #12318

Skinz3 opened this issue Mar 26, 2024 · 22 comments
Assignees
Labels
api: firestore Issues related to the Firestore API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@Skinz3
Copy link

Skinz3 commented Mar 26, 2024

Hello !

I am using Firestore with C# SDK

After running this simple code :

int callsCount = 0;
int resultCount = 0;

var stores = FirestoreManager.Instance.Collection("stores");

stores.Listen((QuerySnapshot querySnapshot) =>
{
    foreach (var sn in querySnapshot)
    {
        var storeTask = FirestoreManager.Instance.Collection($"stores/{sn.Id}/task_configurations");

        callsCount++;
        storeTask.Listen((QuerySnapshot querySnapshot) =>
        {
            resultCount++;
        });
    }
});


Thread.Sleep(6000);

Logger.Write("Calls : " + callsCount + " Results : " + resultCount);

callsCount is equal to the size of my 'stores' collection, but resultCount is always equal to 100 (my collection is bigger).

The same code in javascript on the same collections is working (resultCount == callsCount == 300).

var listenersCount = 1;
var resultsCount = 1;

admin.firestore().collection('foo').onSnapshot((snapshot) => {
        resultsCount++;
        snapshot.docChanges().forEach((change) => {
listenersCount++;
  change.doc.ref.collection('foo_foo').onSnapshot((foo_snapshot) => {
                    configurationsSnapshot.docChanges().forEach((configurationChange) => {
  resultsCount++;
                    });
});

await sleep(10000);
console.log(listenersCount);
console.log(resultsCount);

I was wondering if it could be related to GRPC settings, like 'Max concurrent stream' do you have any clue?

@amanda-tarafa amanda-tarafa added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. api: firestore Issues related to the Firestore API. labels Mar 26, 2024
@jskeet
Copy link
Collaborator

jskeet commented Mar 26, 2024

I'll look into this tomorrow. Thanks for the minimal repro - that'll really help. I don't have any initial ideas about what might be going on, but I'll let you if when I've uncover anything.

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

I can't reproduce the problem, but I've fixed one bug in your code - I doubt that it's causing the problem here, but it's worth knowing about: you're modifying resultCount from potentially multiple threads without any sort of memory barrier. With your original code and 300 documents, I ended up getting results of about 286. Using Interlocked.Increment fixed that.

Here's the code I've used to try to reproduce the problem:

// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License"):
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

using Google.Cloud.Firestore;

if (args.Length != 2)
{
    Console.WriteLine("Arguments: <project-id> <collection-id>");
    return;
}

string projectId = args[0];
string collectionId = args[1];

var db = FirestoreDb.Create(projectId);

var collection = db.Collection(collectionId);

int callCount = 0;
int resultCount = 0;
collection.Listen((QuerySnapshot querySnapshot) =>
{
    foreach (var sn in querySnapshot)
    {
        var subcollection = sn.Reference.Collection("subcollection");

        Interlocked.Increment(ref callCount);
        subcollection.Listen((QuerySnapshot querySnapshot) =>
        {
            Interlocked.Increment(ref resultCount);
        });
    }
});

Thread.Sleep(10000);

Console.WriteLine($"Calls: {callCount}; Results: {resultCount}");

Now, it's possible that this depends on the gRPC implementation you're using. I was testing with a .NET 6 console app, on Windows - it would be using Grpc.Net.Client. Please could you provide as much information as you can about the context you're running the code in?

I'd also note that with your current code, you're adding new listeners every time the stores collection changes. That means you could end up having multiple listeners for the same store. That may not be an issue in your particular use case, but it's worth considering. If you really only want to set up the listeners for "the documents present at the time of starting" then I'd suggest using CollectionReference.GetSnapshotAsync() instead of Listen().

@jskeet jskeet added the needs more info This issue needs more information from the customer to proceed. label Mar 27, 2024
@Skinz3
Copy link
Author

Skinz3 commented Mar 27, 2024

I can't reproduce the problem, but I've fixed one bug in your code - I doubt that it's causing the problem here, but it's worth knowing about: you're modifying resultCount from potentially multiple threads without any sort of memory barrier. With your original code and 300 documents, I ended up getting results of about 286. Using Interlocked.Increment fixed that.

Here's the code I've used to try to reproduce the problem:

// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License"):
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

using Google.Cloud.Firestore;

if (args.Length != 2)
{
    Console.WriteLine("Arguments: <project-id> <collection-id>");
    return;
}

string projectId = args[0];
string collectionId = args[1];

var db = FirestoreDb.Create(projectId);

var collection = db.Collection(collectionId);

int callCount = 0;
int resultCount = 0;
collection.Listen((QuerySnapshot querySnapshot) =>
{
    foreach (var sn in querySnapshot)
    {
        var subcollection = sn.Reference.Collection("subcollection");

        Interlocked.Increment(ref callCount);
        subcollection.Listen((QuerySnapshot querySnapshot) =>
        {
            Interlocked.Increment(ref resultCount);
        });
    }
});

Thread.Sleep(10000);

Console.WriteLine($"Calls: {callCount}; Results: {resultCount}");

Now, it's possible that this depends on the gRPC implementation you're using. I was testing with a .NET 6 console app, on Windows - it would be using Grpc.Net.Client. Please could you provide as much information as you can about the context you're running the code in?

I'd also note that with your current code, you're adding new listeners every time the stores collection changes. That means you could end up having multiple listeners for the same store. That may not be an issue in your particular use case, but it's worth considering. If you really only want to set up the listeners for "the documents present at the time of starting" then I'd suggest using CollectionReference.GetSnapshotAsync() instead of Listen().

Thanks for for your fast reply.

This code is running over windows 11 environnent.
The project use .NET 8.
The behavior is the same after running the code you provided.
Calls : 275 Results : 100
Google.Cloud.Firestore 3.5.1

I can understand that using so many listeners could lead to significant resource consumption, and a lot of parallelism, but we need it in our use case.

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

I can understand that using so many listeners could lead to significant resource consumption, and a lot of parallelism, but we need it in our use case.

My comment wasn't about using lots of listeners, so much as ending up with duplicate listeners. (If one of the store documents changes, your "outer" listener will create a new "inner" one.)

It's interesting that you're now seeing Calls: 275 instead of 300. Does your collection have fewer documents now?

I'll try this on .NET 8 myself - running on Windows 11 would be trickier, but I can do it if necessary.

I assume this is running on your local dev machine? Is there a firewall or proxy which might be limiting things?

Currently this code is ignoring the value returned by Listen - if that's reporting errors, that could be useful. When I get time, I'll post a new version with more error reporting.

It would also be interesting to try a version which only used a single query snapshot (so no "outer" listener) just to see if that makes any difference. I know it might not be suitable for your application code, but it could help with diagnostics. I'll report back when I've done a bit more testing - it should be in the next couple of hours.

@Skinz3
Copy link
Author

Skinz3 commented Mar 27, 2024

Hello, i found the solution.

I downgraded my project from .NET 8 to .NET 6, and i dont have the problem anymore.

I'm very curious to know where the problem could come from.

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

Okay, that's really good to know, thanks. I'll investigate further - hopefully we can provide a way of getting it working with .NET 8. (I wonder whether they have different defaults for maximum concurrent streams.)

@Skinz3
Copy link
Author

Skinz3 commented Mar 27, 2024

I was wondering if it could be related to that configuration : https://learn.microsoft.com/en-us/dotnet/core/runtime-config/threading "System.Threading.ThreadPool.MaxThreads": 20.

Do you think it could be?

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

No, I don't think it's a threading aspect at all. We should have very few threads running. I strongly suspect that it's a difference in terms of which HttpClientHandler is used, or how it's configured. But I'll look into it.

@jskeet jskeet removed the needs more info This issue needs more information from the customer to proceed. label Mar 27, 2024
Skinz3 added a commit to SecurEat/Firestore.ORM that referenced this issue Mar 27, 2024
@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

Darn - I still can't reproduce, even with .NET 8. I'm going to try on my Windows 11 machine though to see whether that makes a difference.

In the meantime, here's an updated sample which is slightly simpler (no outer listener) but which will report failures in listener tasks. If you could try this and see whether you get any exceptions reported in .NET 8.0, that would be really helpful.

// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License"):
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

using Google.Cloud.Firestore;

if (args.Length != 2)
{
    Console.WriteLine("Arguments: <project-id> <collection-id>");
    return;
}

string projectId = args[0];
string collectionId = args[1];

var db = FirestoreDb.Create(projectId);

var collection = db.Collection(collectionId);

int callbackCount = 0;

var snapshot = await collection.GetSnapshotAsync();
foreach (var doc in snapshot)
{
    var subcollection = doc.Reference.Collection("subcollection");

    var listener = subcollection.Listen(snapshot => Interlocked.Increment(ref callbackCount));
    _ = listener.ListenerTask.ContinueWith(task => Console.WriteLine(task.Exception), TaskContinuationOptions.OnlyOnFaulted);
}

Thread.Sleep(10000);

Console.WriteLine($"Docs: {snapshot.Count}; Callbacks to listener: {callbackCount}");

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

Ooh - on Windows 11 I can reproduce it. That's going to make it a pain to work on, but it's huge progress.
(I don't see any exceptions on the listener tasks though.)

Having done a little debugging, I'm getting a SocketsHttpHandler with EnableMultipleHttp2Connections set to true in all cases, so I'm not sure what the difference is yet. Still working on it...

@jskeet
Copy link
Collaborator

jskeet commented Mar 27, 2024

Have tried and failed to reproduce it with a local gRPC server + client. Will try to at least get our Firestore wrapper library out of the equation next. (That'll be tomorrow.)

@Skinz3
Copy link
Author

Skinz3 commented Mar 28, 2024

Ooh - on Windows 11 I can reproduce it. That's going to make it a pain to work on, but it's huge progress. (I don't see any exceptions on the listener tasks though.)

Having done a little debugging, I'm getting a SocketsHttpHandler with EnableMultipleHttp2Connections set to true in all cases, so I'm not sure what the difference is yet. Still working on it...

image

Same for me, no exception on the listeners.

Thanks for the investigations

@jskeet
Copy link
Collaborator

jskeet commented Mar 28, 2024

Still diagnosing things - and have new code ready to test after lunch - but one thing I've found already is that calling Listen on the document references reproduces the problem, so we don't need to worry about subcollections (which I consider more complex).
Next test will be making the calls directly with the "raw" gRPC stub instead of via the client library. (But a stub set up by the client library, to start with...)

@jskeet
Copy link
Collaborator

jskeet commented Mar 28, 2024

Okay, using the same stub as the client library uses, I can reproduce the problem - only 100 clients ever get to send a request.
When creating a stub manually (using the same credentials, but otherwise-empty channel options) it looks like everything is working.
So now I "just" need to check very carefully what channel options we're providing (and any other relevant metadata, potentially). Progress!

@jskeet
Copy link
Collaborator

jskeet commented Mar 28, 2024

A bit more progress: if you use the same channel that was originally used to list documents, you see the problem. If you use a different channel for listening than for listing the documents, all is well.

So the next question is why...

I won't be working again until Tuesday (Friday and Morning are UK public holidays) but I'll pick it up after that.

@jskeet
Copy link
Collaborator

jskeet commented Apr 2, 2024

Next bit of information - using CollectionReference.ListDocumentsAsync still demonstrates the problem (and is a much simpler call).

@jskeet
Copy link
Collaborator

jskeet commented Apr 2, 2024

Okay, trying this with just the "raw" code, and with more targets, I can see that it still works in .NET 6, but it's still broken in .NET 7, 8 and 9.
I'm going to do a little more investigation, but I'm getting very close to "I can't dig any further, I need to file a bug in https://github.com/grpc/grpc-dotnet".

@jskeet
Copy link
Collaborator

jskeet commented Apr 2, 2024

Okay, using the same stub as the client library uses, I can reproduce the problem - only 100 clients ever get to send a request.

This appears to be different when using the "raw" stub. Everything sends a request, but most only 100 of the clients get a response. I'm going to try to reproduce the earlier "blocked while sending" in case that's relevant.

@jskeet
Copy link
Collaborator

jskeet commented Apr 2, 2024

I'm going to try to reproduce the earlier "blocked while sending" in case that's relevant.

I can't reproduce that at the moment - I suspect it was a diagnostic failure on my part. For the moment, I'll go with "I reckon it's not getting any responses."

@jskeet
Copy link
Collaborator

jskeet commented Apr 2, 2024

I've created a new issue linked above, as I think I've got to the end of what I can do at the moment. We'll see what additional diagnostics are suggested.

@jskeet jskeet added the external This issue is blocked on a bug with the actual product. label Apr 8, 2024
@jskeet
Copy link
Collaborator

jskeet commented Apr 8, 2024

Marking as external as we're basically blocked on feedback on the gRPC client.

@jskeet
Copy link
Collaborator

jskeet commented May 2, 2024

@Skinz3: We now understand what's going on, and there's a workaround on the issue - although it's really not terribly pleasant. I'm going to keep thinking how we can fix this in the Cloud client libraries - for the moment, a simpler workaround would be to use the Grpc.Core implementation, if you're happy to do that. (See the transport selection docs for details.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the Firestore API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants