Too many full GCs when receive big responses #13236

azyobuzin · 2017-11-02T07:21:38Z

I found that a lot of full GCs are run when the client receives big responses (about 100KB).

Here is an example code: https://gist.github.com/azyobuzin/7fee0a968a8780fc56b7e8a829c60d9b#file-grpcmemorytest-cs

In the example, the client will receive 100KB responses every 0.1 seconds.

I ran the code, profiling with CLR Profiler. Then I found that full(gen 2) GCs were run about every 3 seconds, and a lot of allocation happened in BatchContextSafeHandle.GetReceivedMessage.

The result of profiling: https://gist.github.com/azyobuzin/7fee0a968a8780fc56b7e8a829c60d9b#file-grpcmemorytest-log

I think the situation where full GCs are run many times is not good. I would suggest it use an object pool, for example System.Buffers.ArrayPool.

What version of gRPC and what language are you using?

gRPC C# 1.7.1 from NuGet

What operating system (Linux, Windows, …) and version?

Windows (.NET Framework 4.7)

The text was updated successfully, but these errors were encountered:

jtattermusch · 2017-11-02T08:37:58Z

You're right, this is happening because for each message sent, we serialize in a byte[] and then let the buffer to be garbage collected (for large objects, they are always taken from the large object generation which requires FullGC to garbage collect). Same thing happens for deserialization.
There's definitely room for improvement here. Using ArrayPool is one of the options, but at the same time, we'd like to avoid copying data between the managed and native layer (likely using pinning), so the whole situation is a bit more complicated. Also, we already have an API defined for marshaller and unmarshaller, and we don't want to break the public API so the change needs to be done carefully.

Do you have any specific suggestions how the change should be made?

azyobuzin · 2017-11-02T11:24:44Z

I first thought that it would work if I replace byte[] with ArraySegment or something IDisposable which returns the buffer when Dispose.

As to the native layer, I think it is OK if we pass a larger array than required in the native layer, and ArrayPool will return a larger array than we specify.

As to marshallers, I thought of the below code, but existing deserializers will slow down. So I couldn't have a good idea.

public class RentedBuffer : IDisposable
{
    public ArraySegment<byte> Buffer { get; }
    public void Dispose();
}

public class Marshaller<T>
{
    Func<T, byte[]> bytesSerializer;
    Func<byte[], T> bytesDeserializer;
    Func<T, RentedBuffer> segmentSerializer;
    Func<ArraySegment<byte>, T> segmentDeserializer;

    public Marshaller(Func<T, byte[]> serializer, Func<byte[], T> deserializer)
    {
        this.bytesSerializer = GrpcPreconditions.CheckNotNull(serializer, nameof(serializer));
        this.bytesDeserializer = GrpcPreconditions.CheckNotNull(deserializer, nameof(deserializer));
    }

    public Marshaller(Func<T, RentedBuffer> serializer, Func<ArraySegment<byte>, T> deserializer)
    {
        this.segmentSerializer = GrpcPreconditions.CheckNotNull(serializer, nameof(serializer));
        this.segmentDeserializer = GrpcPreconditions.CheckNotNull(deserializer, nameof(deserializer));
    }

    public Func<T, byte[]> Serializer
    {
        get
        {
            if (this.bytesSerializer == null)
            {
                this.bytesSerializer = x =>
                {
                    using (var serializationResult = this.segmentSerializer(x))
                    {
                        var seg = serializationResult.Buffer;
                        // slow, but not used in Grpc.Core
                        var bs = new byte[seg.Count];
                        Buffer.BlockCopy(seg.Array, seg.Offset, bs, 0, seg.Count);
                        return seg;
                    }
                };
            }

            return this.bytesSerializer;
        }
    }

    public Func<byte[], T> Deserializer
    {
        get
        {
            if (this.bytesDeserializer == null)
            {
                this.bytesDeserializer = bs => this.segmentDeserializer(new ArraySegment<byte>(bs));
            }

            return this.bytesDeserializer;
        }
    }

    public Func<T, RentedBuffer> SegmentSerializer
    {
        get
        {
            if (this.segmentSerializer == null)
            {
                this.segmentSerializer = x => new RentedBuffer(this.bytesSerializer(x));
            }

            return this.segmentSerializer;
        }
    }

    public Func<ArraySegment<byte>, T> segmentDeserializer
    {
        get
        {
            if (this.segmentDeserializer == null)
            {
                this.segmentDeserializer = seg =>
                {
                    if (seg.Count == seg.Array.Length)
                    {
                        return this.bytesDeserializer(seg.Array);
                    }

                    // slow
                    var bs = new byte[seg.Count];
                    Buffer.BlockCopy(seg.Array, seg.Offset, bs, 0, seg.Count);
                    return this.bytesDeserializer(bs);
                }
            }

            return this.segmentDeserializer;
        }
    }
}

kkm000 · 2017-11-03T01:50:02Z

Just to think of, I am using this baby when processing realtime audio, where had the same problem (not with gRPC): https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream. It seems very efficient in avoiding the large GCs. But it has a stream interface, not an array.

jtattermusch · 2019-05-31T13:09:05Z

This was partially addressed by merging #18865. Grpc now allows using a deserializer that allows reading the received payload as ReadOnlySequence - which involves no copying.
Limitations:

you need to use netstandard2.0 (doesn't work for net45 projects)
you'd need to specify a custom deserializer yourself as with default protobuf deserialization, the payload is still copied to a new byte array (more work is needed here).

kkm000 · 2019-08-05T03:44:32Z

@jtattermusch, .NET Core has added a lot of classes to reuse memory allocations, Span<T>, Memory<T> and friends (it's notable that C++17's stdc++ std::string_view is an advance in the same direction). I believe netstandard2.0 has most of these, and 2.1 adds mostly buffer-reuse-by-pooling classes. Nothing of sorts in in the Framework, however; this would mean we'd need a lot of split code paths based on the particular variety of .NET we are compiling for. This likely calls for some layer of abstraction for handling memory reuse, which seems quite an architectural rework. Also, unless I'm missing something, the pool variants that are specified in .NET Standard 2.{0,1} are for fixed-size buffer churn, which is not very helpful to the general gRPC pattern of use--the messages all vary in length, naturally.

In addition to the [de]serialization you mention, Protobuffers all operate with the System.String on the surface API level (the proto's string type generates the C# string), which are immutable, and, when large, are contributing to the increased frequency of lower GC generations and, should they exceed the ≈23K char limit, Large Object allocations, which are fragmenting. So the problem, looked at from the wider angle, spans both the gRPC and Protobuffers memory use patterns.

jtattermusch · 2019-08-05T23:06:13Z

@kkm000 we now support deserializationContext.PayloadAsReadOnlySequence() for all the target platforms. Now I also have some WIP for supporting IBufferWriter for serialization (which would allow serializing directly into native memory) here: #19792

stale · 2020-02-01T23:26:53Z

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions!

sandersaares · 2020-02-03T10:41:30Z

Optimization work on this topic continues to be relevant.

jtattermusch · 2020-02-03T13:05:38Z

Once this protobuf PR is available, we will be able deserialize without incurring the GC for large objects protocolbuffers/protobuf#5888

stale · 2020-05-06T07:22:52Z

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

jtattermusch · 2020-05-11T07:02:46Z

protocolbuffers/protobuf#7351 has been merged. Once that feature gets release (and we add bindings for the new de-serialization API on the gRPC side), this issue can be marked as fixed.

jonso4 · 2021-02-19T19:31:32Z

@jtattermusch has the new serialization API been exposed in gRPC C# yet? I am trying to figure out how to use PayloadAsReadOnlySequence but can't figure how. It looks like it needs to be supported directly in the Marshaller. I also see in the docs that option optimize_for = SPEED is not supported by the C# codegen.

jtattermusch · 2021-02-24T08:56:03Z

@jtattermusch has the new serialization API been exposed in gRPC C# yet? I am trying to figure out how to use PayloadAsReadOnlySequence but can't figure how. It looks like it needs to be supported directly in the Marshaller. I also see in the docs that option optimize_for = SPEED is not supported by the C# codegen.

It's been available for more than a year now, see the PR for details: #23485 (it also demonstrates the PayloadAsReadOnlySequence usage).

C# doesn't support the optimize_for = SPEED since there isn't any speed / generated codesize tradeoff. The "default" version of generated code is optimized well enough.

jonso4 · 2021-05-07T17:57:07Z

@jtattermusch thank you! This work has significantly reduced our GC time! There is still one area causing large GC though, which is the fact that we can't pass rented buffers from an ArrayPool to the final Grpc response because we don't know when they will be released. I know other people have brought this up - is there any plan to support this? Or are there any good approaches to work around this limitation?

jtattermusch · 2021-05-10T09:11:49Z

@jtattermusch thank you! This work has significantly reduced our GC time! There is still one area causing large GC though, which is the fact that we can't pass rented buffers from an ArrayPool to the final Grpc response because we don't know when they will be released. I know other people have brought this up - is there any plan to support this? Or are there any good approaches to work around this limitation?

Related issues / PRs are here: protocolbuffers/protobuf#8238, protocolbuffers/protobuf#7828, protocolbuffers/protobuf#7645

Right now there's no plan to enable deserializing messages using pre-allocated buffers, since there is currently no API that would allow one to control the lifetime of rented memory (C# protobuf messages/fields have no "Dispose" or "Return" method). As it's been pointed out previously, implementing this properly would be a major effort that would to be carefully designed (and we currently don't have the cycles for it). And we don't want a half-baked solution.

I'm going to close this issue as there's currently nothing to do here.

Some general advice for GC improvements:

use smaller messages to avoid newly allocated objects from going into LOH, where it's very costly to GC them.
if you really want no-allocation serialization/deserialization you can take a look at FlatBuffers

jtattermusch added the lang/C# label Nov 2, 2017

jtattermusch self-assigned this Nov 2, 2017

jtattermusch added the area/performance label Nov 2, 2017

This was referenced Nov 9, 2017

C#: Allow message parsing from an array slice protocolbuffers/protobuf#3858

Merged

Performance issues caused by expensive exceptions thrown in RunConnectivityWatcherAsync #13227

Closed

jtattermusch mentioned this issue Dec 5, 2017

Avoid C# GC when deserializing large messages #13440

Closed

jtattermusch added kind/enhancement priority/P3 labels May 15, 2018

stale bot added the disposition/stale label Feb 1, 2020

stale bot removed the disposition/stale label Feb 3, 2020

stale bot added the disposition/stale label May 6, 2020

stale bot removed the disposition/stale label May 11, 2020

jtattermusch added the disposition/never stale label May 11, 2020

Jerryzhao-z mentioned this issue Aug 18, 2020

tikvClient.RawBatchGet takes too much time to RecvMsg tikv/client-go#51

Open

jtattermusch closed this as completed May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many full GCs when receive big responses #13236

Too many full GCs when receive big responses #13236

azyobuzin commented Nov 2, 2017

jtattermusch commented Nov 2, 2017 •

edited

azyobuzin commented Nov 2, 2017 •

edited

kkm000 commented Nov 3, 2017

jtattermusch commented May 31, 2019

kkm000 commented Aug 5, 2019

jtattermusch commented Aug 5, 2019

stale bot commented Feb 1, 2020

sandersaares commented Feb 3, 2020

jtattermusch commented Feb 3, 2020

stale bot commented May 6, 2020

jtattermusch commented May 11, 2020

jonso4 commented Feb 19, 2021 •

edited

jtattermusch commented Feb 24, 2021

jonso4 commented May 7, 2021

jtattermusch commented May 10, 2021

Too many full GCs when receive big responses #13236

Too many full GCs when receive big responses #13236

Comments

azyobuzin commented Nov 2, 2017

What version of gRPC and what language are you using?

What operating system (Linux, Windows, …) and version?

jtattermusch commented Nov 2, 2017 • edited

azyobuzin commented Nov 2, 2017 • edited

kkm000 commented Nov 3, 2017

jtattermusch commented May 31, 2019

kkm000 commented Aug 5, 2019

jtattermusch commented Aug 5, 2019

stale bot commented Feb 1, 2020

sandersaares commented Feb 3, 2020

jtattermusch commented Feb 3, 2020

stale bot commented May 6, 2020

jtattermusch commented May 11, 2020

jonso4 commented Feb 19, 2021 • edited

jtattermusch commented Feb 24, 2021

jonso4 commented May 7, 2021

jtattermusch commented May 10, 2021

jtattermusch commented Nov 2, 2017 •

edited

azyobuzin commented Nov 2, 2017 •

edited

jonso4 commented Feb 19, 2021 •

edited