Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many full GCs when receive big responses #13236

Closed
azyobuzin opened this issue Nov 2, 2017 · 15 comments
Closed

Too many full GCs when receive big responses #13236

azyobuzin opened this issue Nov 2, 2017 · 15 comments

Comments

@azyobuzin
Copy link

I found that a lot of full GCs are run when the client receives big responses (about 100KB).

Here is an example code: https://gist.github.com/azyobuzin/7fee0a968a8780fc56b7e8a829c60d9b#file-grpcmemorytest-cs

In the example, the client will receive 100KB responses every 0.1 seconds.

I ran the code, profiling with CLR Profiler. Then I found that full(gen 2) GCs were run about every 3 seconds, and a lot of allocation happened in BatchContextSafeHandle.GetReceivedMessage.

when GCs happen

where allocations happen

The result of profiling: https://gist.github.com/azyobuzin/7fee0a968a8780fc56b7e8a829c60d9b#file-grpcmemorytest-log

I think the situation where full GCs are run many times is not good. I would suggest it use an object pool, for example System.Buffers.ArrayPool.

What version of gRPC and what language are you using?

gRPC C# 1.7.1 from NuGet

What operating system (Linux, Windows, …) and version?

Windows (.NET Framework 4.7)

@jtattermusch
Copy link
Contributor

jtattermusch commented Nov 2, 2017

You're right, this is happening because for each message sent, we serialize in a byte[] and then let the buffer to be garbage collected (for large objects, they are always taken from the large object generation which requires FullGC to garbage collect). Same thing happens for deserialization.
There's definitely room for improvement here. Using ArrayPool is one of the options, but at the same time, we'd like to avoid copying data between the managed and native layer (likely using pinning), so the whole situation is a bit more complicated. Also, we already have an API defined for marshaller and unmarshaller, and we don't want to break the public API so the change needs to be done carefully.

Do you have any specific suggestions how the change should be made?

@azyobuzin
Copy link
Author

azyobuzin commented Nov 2, 2017

I first thought that it would work if I replace byte[] with ArraySegment or something IDisposable which returns the buffer when Dispose.

As to the native layer, I think it is OK if we pass a larger array than required in the native layer, and ArrayPool will return a larger array than we specify.

As to marshallers, I thought of the below code, but existing deserializers will slow down. So I couldn't have a good idea.

public class RentedBuffer : IDisposable
{
    public ArraySegment<byte> Buffer { get; }
    public void Dispose();
}

public class Marshaller<T>
{
    Func<T, byte[]> bytesSerializer;
    Func<byte[], T> bytesDeserializer;
    Func<T, RentedBuffer> segmentSerializer;
    Func<ArraySegment<byte>, T> segmentDeserializer;

    public Marshaller(Func<T, byte[]> serializer, Func<byte[], T> deserializer)
    {
        this.bytesSerializer = GrpcPreconditions.CheckNotNull(serializer, nameof(serializer));
        this.bytesDeserializer = GrpcPreconditions.CheckNotNull(deserializer, nameof(deserializer));
    }

    public Marshaller(Func<T, RentedBuffer> serializer, Func<ArraySegment<byte>, T> deserializer)
    {
        this.segmentSerializer = GrpcPreconditions.CheckNotNull(serializer, nameof(serializer));
        this.segmentDeserializer = GrpcPreconditions.CheckNotNull(deserializer, nameof(deserializer));
    }

    public Func<T, byte[]> Serializer
    {
        get
        {
            if (this.bytesSerializer == null)
            {
                this.bytesSerializer = x =>
                {
                    using (var serializationResult = this.segmentSerializer(x))
                    {
                        var seg = serializationResult.Buffer;
                        // slow, but not used in Grpc.Core
                        var bs = new byte[seg.Count];
                        Buffer.BlockCopy(seg.Array, seg.Offset, bs, 0, seg.Count);
                        return seg;
                    }
                };
            }

            return this.bytesSerializer;
        }
    }

    public Func<byte[], T> Deserializer
    {
        get
        {
            if (this.bytesDeserializer == null)
            {
                this.bytesDeserializer = bs => this.segmentDeserializer(new ArraySegment<byte>(bs));
            }

            return this.bytesDeserializer;
        }
    }

    public Func<T, RentedBuffer> SegmentSerializer
    {
        get
        {
            if (this.segmentSerializer == null)
            {
                this.segmentSerializer = x => new RentedBuffer(this.bytesSerializer(x));
            }

            return this.segmentSerializer;
        }
    }

    public Func<ArraySegment<byte>, T> segmentDeserializer
    {
        get
        {
            if (this.segmentDeserializer == null)
            {
                this.segmentDeserializer = seg =>
                {
                    if (seg.Count == seg.Array.Length)
                    {
                        return this.bytesDeserializer(seg.Array);
                    }

                    // slow
                    var bs = new byte[seg.Count];
                    Buffer.BlockCopy(seg.Array, seg.Offset, bs, 0, seg.Count);
                    return this.bytesDeserializer(bs);
                }
            }

            return this.segmentDeserializer;
        }
    }
}

@kkm000
Copy link
Contributor

kkm000 commented Nov 3, 2017

Just to think of, I am using this baby when processing realtime audio, where had the same problem (not with gRPC): https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream. It seems very efficient in avoiding the large GCs. But it has a stream interface, not an array.

@jtattermusch
Copy link
Contributor

This was partially addressed by merging #18865. Grpc now allows using a deserializer that allows reading the received payload as ReadOnlySequence - which involves no copying.
Limitations:

  • you need to use netstandard2.0 (doesn't work for net45 projects)
  • you'd need to specify a custom deserializer yourself as with default protobuf deserialization, the payload is still copied to a new byte array (more work is needed here).

@kkm000
Copy link
Contributor

kkm000 commented Aug 5, 2019

@jtattermusch, .NET Core has added a lot of classes to reuse memory allocations, Span<T>, Memory<T> and friends (it's notable that C++17's stdc++ std::string_view is an advance in the same direction). I believe netstandard2.0 has most of these, and 2.1 adds mostly buffer-reuse-by-pooling classes. Nothing of sorts in in the Framework, however; this would mean we'd need a lot of split code paths based on the particular variety of .NET we are compiling for. This likely calls for some layer of abstraction for handling memory reuse, which seems quite an architectural rework. Also, unless I'm missing something, the pool variants that are specified in .NET Standard 2.{0,1} are for fixed-size buffer churn, which is not very helpful to the general gRPC pattern of use--the messages all vary in length, naturally.

In addition to the [de]serialization you mention, Protobuffers all operate with the System.String on the surface API level (the proto's string type generates the C# string), which are immutable, and, when large, are contributing to the increased frequency of lower GC generations and, should they exceed the ≈23K char limit, Large Object allocations, which are fragmenting. So the problem, looked at from the wider angle, spans both the gRPC and Protobuffers memory use patterns.

@jtattermusch
Copy link
Contributor

@kkm000 we now support deserializationContext.PayloadAsReadOnlySequence() for all the target platforms. Now I also have some WIP for supporting IBufferWriter for serialization (which would allow serializing directly into native memory) here: #19792

@stale
Copy link

stale bot commented Feb 1, 2020

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions!

@sandersaares
Copy link

Optimization work on this topic continues to be relevant.

@stale stale bot removed the disposition/stale label Feb 3, 2020
@jtattermusch
Copy link
Contributor

Once this protobuf PR is available, we will be able deserialize without incurring the GC for large objects protocolbuffers/protobuf#5888

@stale
Copy link

stale bot commented May 6, 2020

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

@jtattermusch
Copy link
Contributor

protocolbuffers/protobuf#7351 has been merged. Once that feature gets release (and we add bindings for the new de-serialization API on the gRPC side), this issue can be marked as fixed.

@jonso4
Copy link

jonso4 commented Feb 19, 2021

@jtattermusch has the new serialization API been exposed in gRPC C# yet? I am trying to figure out how to use PayloadAsReadOnlySequence but can't figure how. It looks like it needs to be supported directly in the Marshaller. I also see in the docs that option optimize_for = SPEED is not supported by the C# codegen.

@jtattermusch
Copy link
Contributor

@jtattermusch has the new serialization API been exposed in gRPC C# yet? I am trying to figure out how to use PayloadAsReadOnlySequence but can't figure how. It looks like it needs to be supported directly in the Marshaller. I also see in the docs that option optimize_for = SPEED is not supported by the C# codegen.

It's been available for more than a year now, see the PR for details: #23485 (it also demonstrates the PayloadAsReadOnlySequence usage).

C# doesn't support the optimize_for = SPEED since there isn't any speed / generated codesize tradeoff. The "default" version of generated code is optimized well enough.

@jonso4
Copy link

jonso4 commented May 7, 2021

@jtattermusch thank you! This work has significantly reduced our GC time! There is still one area causing large GC though, which is the fact that we can't pass rented buffers from an ArrayPool to the final Grpc response because we don't know when they will be released. I know other people have brought this up - is there any plan to support this? Or are there any good approaches to work around this limitation?

@jtattermusch
Copy link
Contributor

@jtattermusch thank you! This work has significantly reduced our GC time! There is still one area causing large GC though, which is the fact that we can't pass rented buffers from an ArrayPool to the final Grpc response because we don't know when they will be released. I know other people have brought this up - is there any plan to support this? Or are there any good approaches to work around this limitation?

Related issues / PRs are here: protocolbuffers/protobuf#8238, protocolbuffers/protobuf#7828, protocolbuffers/protobuf#7645

Right now there's no plan to enable deserializing messages using pre-allocated buffers, since there is currently no API that would allow one to control the lifetime of rented memory (C# protobuf messages/fields have no "Dispose" or "Return" method). As it's been pointed out previously, implementing this properly would be a major effort that would to be carefully designed (and we currently don't have the cycles for it). And we don't want a half-baked solution.

I'm going to close this issue as there's currently nothing to do here.

Some general advice for GC improvements:

  • use smaller messages to avoid newly allocated objects from going into LOH, where it's very costly to GC them.
  • if you really want no-allocation serialization/deserialization you can take a look at FlatBuffers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants