Optimize writing strings across multiple buffers #7663

JamesNK · 2020-06-30T02:35:16Z

Before:

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 213.53 us | 1.9358 us | 1.8107 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  73.99 us | 0.7287 us | 0.6817 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  52.32 us | 0.4014 us | 0.3558 us |      6.3477 |           - |           - |              5040 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  23.72 us | 0.2815 us | 0.2496 us |     15.5945 |           - |           - |             12288 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  10.72 us | 0.2083 us | 0.1949 us |     12.8174 |           - |           - |             10104 B |

After (not caching Encoder):

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 144.75 us | 1.6373 us | 1.5316 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  54.72 us | 0.5721 us | 0.5351 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  30.17 us | 0.1616 us | 0.1433 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  20.02 us | 0.1819 us | 0.1612 us |      6.8054 |           - |           - |              5376 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  13.84 us | 0.1661 us | 0.1554 us |      0.0610 |           - |           - |                56 B |

After (caching Encoder):

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 146.20 us | 2.3256 us | 2.1753 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  53.98 us | 0.3705 us | 0.3466 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  30.12 us | 0.1255 us | 0.1113 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  19.39 us | 0.4198 us | 0.4666 us |      0.0610 |           - |           - |                56 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  13.60 us | 0.0412 us | 0.0344 us |      0.0610 |           - |           - |                56 B |

csharp/src/Google.Protobuf.Test.TestProtos/UnittestIssues.cs

JamesNK · 2020-07-01T01:07:35Z

Added to this PR is initializing the initial buffer with IBufferWriter.

Before:

|                                               Method | BytesToWrite | encodedSize |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           1 | 39.56 ns | 0.6020 ns | 0.5631 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           2 | 43.09 ns | 0.7069 ns | 0.6612 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           3 | 45.48 ns | 0.2256 ns | 0.1884 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           4 | 49.91 ns | 0.6544 ns | 0.5801 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           5 | 53.81 ns | 0.9889 ns | 0.8766 ns |           - |           - |           - |                   - |

After:

|                                               Method | BytesToWrite | encodedSize |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           1 | 33.21 ns | 0.6996 ns | 0.6544 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           2 | 33.31 ns | 0.2451 ns | 0.2047 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           3 | 35.42 ns | 0.7321 ns | 0.7518 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           4 | 38.52 ns | 0.8109 ns | 0.7585 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           5 | 40.58 ns | 0.6540 ns | 0.5798 ns |           - |           - |           - |                   - |

JamesNK · 2020-07-07T09:10:24Z

Rebased to fix merge conflict

@jtattermusch Could you please take a look

JamesNK · 2020-07-08T00:07:44Z

csharp/src/Google.Protobuf/WriteBufferHelper.cs

        {
            if (state.writeBufferHelper.codedOutputStream?.InternalOutputStream != null)
            {
+                Debug.Assert(sizeHint == 0, "CodedOutputStream does not support sizeHint.");


We (.NET team) use Debug.Assert to check for things that shouldn't happen, but aren't worth checking at runtime because they aren't caused by external input. I couldn't find any other uses of Debug.Assert outside of test code.

What do you want to do here:

Remove this?

Verify are runtime (check and throw an exception)

Ok to leave as is

jtattermusch · 2020-07-08T15:37:18Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

+            // Encoder will keep state of unwritten data.
+            if (state.stringEncoder == null)
+            {
+                state.stringEncoder = Encoding.UTF8.GetEncoder();


same problem with caching the stringEncoder as in the other PR. Not seeing stringEncoder.Reset() anywhere seems suspicious.

I've changed it so that the convert loop will stop when completed is true. That ensures there is no remaining state.

See remarks - https://docs.microsoft.com/en-us/dotnet/api/system.text.encoder.convert?view=netcore-3.1#System_Text_Encoder_Convert_System_Char__System_Int32_System_Byte__System_Int32_System_Boolean_System_Int32__System_Int32__System_Boolean__

I think it would be safer to just call Reset() proactively because it would make the reasoning much easier.
Like this if an exception is thrown, the internal state of encoder will be maintained. I know normally throwing would invalidate the entire write context, so this should be fine in theory, but it's hard to predict all the possible patterns of use so I think relying on this is fragile.
How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

Like this if an exception is thrown, the internal state of encoder will be maintained.

Is CodedOutputStream suppose to be safe to use after it throws an exception? We could put it in a try/catch and reset on an error, but it doesn't seem useful if other state (buffer content, position, written output, etc) is in invalid states.

How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

I can test, but the completed result says that the internal buffer has been emptied so Reset would never do anything.

https://docs.microsoft.com/en-us/dotnet/api/system.text.encoder.convert?view=netcore-3.1#System_Text_Encoder_Convert_System_Char__System_Int32_System_Byte__System_Int32_System_Boolean_System_Int32__System_Int32__System_Boolean

The completed output parameter indicates whether all the data in the input buffer was converted and stored in the output buffer.

jtattermusch · 2020-07-20T12:32:25Z

csharp/src/Google.Protobuf.Benchmarks/WriteRawPrimitivesBenchmark.cs

+        [Arguments(3)]
+        [Arguments(4)]
+        [Arguments(5)]
+        public void WriteRawVarint32_WriteContextBufferWriter_FirstWrite(int encodedSize)


what is the purpose of this benchmark? it looks like this benchmark doesn't logically belong to this PR?

Found it, looks like this is related to the WriteBufferHelper.Initialize change below. Ideally would be a separate PR, but since it's already here, let's leave it.

jtattermusch · 2020-07-20T12:35:40Z

csharp/src/Google.Protobuf.Benchmarks/WriteRawPrimitivesBenchmark.cs

@@ -382,6 +401,23 @@ public void WriteString_WriteContext(int encodedSize)
            ctx.CheckNoSpaceLeft();
        }

+        [Benchmark]
+        [ArgumentsSource(nameof(StringEncodedSizes))]
+        public void WriteString_WriteContextBufferWriter_LimitBufferSize(int encodedSize)


nit : _WriteContextBufferWriter_ -> can be just _BufferWriter_ as "WriteContext" is redundant information here?
Also below.

jtattermusch · 2020-07-20T14:40:54Z

csharp/src/Google.Protobuf/WriteBufferHelper.cs

@@ -71,7 +72,7 @@ public static void Initialize(IBufferWriter<byte> bufferWriter, out WriteBufferH
        {
            instance.bufferWriter = bufferWriter;
            instance.codedOutputStream = null;
-            buffer = default;  // TODO: initialize the initial buffer so that the first write is not via slowpath.
+            buffer = bufferWriter.GetSpan();


note: this also means that just initializing a writeContext from IBufferWriter will have some non-zero cost (e.g. potentially allocating a buffer). Not sure if there are any scenarios where users create a write context and then decide not to write anything (e.g. because of an error, or because they are actually writing an empty message) but such scenarios would get more expensive by doing this.
Of course under normal circumstances for non-empty messages, this is a good optimization. So we can do it, but let's be aware of the extra overhead for empty writes.

Btw Ideally this change would belong to a separate PR (together with the corresponding benchmark).

jtattermusch · 2020-07-20T14:46:35Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

-                        buffer[state.position + i] = (byte)value[i];
-                    }
-                    state.position += length;
+                    // String doesn't fit in refreshed buffer. Write across multiple


nit: unfinished sentence in the comment?

jtattermusch · 2020-07-20T14:51:21Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

-                if (length == value.Length) // Must be all ASCII...
+                // String doesn't fit in the remaining buffer.
+                // Refreshing the buffer could free up enough space to write string.
+                WriteBufferHelper.RefreshBuffer(ref buffer, ref state);


IMHO this optimization is pretty speculative and I think maintaining the current invariant of always writing all the way to the end of the current buffer is better for now.
We've just made a big change to how the serialization works internally and the old code has always filled the current buffer to its end so I think it's better be a bit conservative, wait for things to settle down a little bit and only then change invariants things like this. I'm also not convinced that this is always going to be a performance benefit.

Looks like this optimization is pretty much independent of all the other logic in the PR, so I'd like to consider it separately (and also benchmark it separately).

jtattermusch · 2020-07-20T14:57:37Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

@@ -47,6 +47,7 @@ internal static class WritingPrimitives
    {
        // "Local" copy of Encoding.UTF8, for efficiency. (Yes, it makes a difference.)
        internal static readonly Encoding Utf8Encoding = Encoding.UTF8;
+        private const int MaximumBytesPerUtf8Char = 4;


Sounds right, but it would be good to put some evidence for this in a comment. (e.g. link to the spec or e.g. a similar constant in UtfEncoding.GetMaxByteCount's implementation)

jtattermusch · 2020-07-20T15:14:56Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

+            // Encoder will keep state of unwritten data.
+            if (state.stringEncoder == null)
+            {
+                state.stringEncoder = Encoding.UTF8.GetEncoder();


I think it would be safer to just call Reset() proactively because it would make the reasoning much easier.
Like this if an exception is thrown, the internal state of encoder will be maintained. I know normally throwing would invalidate the entire write context, so this should be fine in theory, but it's hard to predict all the possible patterns of use so I think relying on this is fragile.
How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

jtattermusch · 2020-07-20T15:20:52Z

csharp/src/Google.Protobuf/WritingPrimitives.cs

+                // Refresh the buffer with a minimum size of the maximum unicode char byte size.
+                // A minimum buffer size is required because at least one unicode character must
+                // be written when Encoder.Convert is called. 
+                WriteBufferHelper.RefreshBuffer(ref buffer, ref state, sizeHint: MaximumBytesPerUtf8Char);


I'm not a fan of introducing the need for having a sizeHint in refreshbuffer for just this specific corner case.

It sounds like having less than MaximumBytesPerUtf8Char bytes left would be a pretty rare condition, so perhaps if that happens, we could write those bytes into a stackalloc'd buffer and then write the resulting bytes into the destination using WriteRawBytes(span)?

deannagarcia · 2021-10-14T19:37:26Z

I'm going to close this request given that there was no response to the last round of comments, but feel free to open again and respond!

googlebot added the cla: yes label Jun 30, 2020

JamesNK commented Jun 30, 2020

View reviewed changes

csharp/src/Google.Protobuf.Test.TestProtos/UnittestIssues.cs Outdated Show resolved Hide resolved

JamesNK force-pushed the jamesnk/writestring-multisegment branch from 36f0c26 to 500b722 Compare June 30, 2020 08:27

JamesNK force-pushed the jamesnk/writestring-multisegment branch 2 times, most recently from 03150ce to 38c5018 Compare July 7, 2020 09:08

JamesNK commented Jul 8, 2020

View reviewed changes

jtattermusch reviewed Jul 8, 2020

View reviewed changes

Optimize writing strings across multiple buffers

b4b7393

JamesNK force-pushed the jamesnk/writestring-multisegment branch from bdf2448 to b4b7393 Compare July 9, 2020 04:16

jtattermusch added the c# label Jul 9, 2020

jtattermusch reviewed Jul 20, 2020

View reviewed changes

jtattermusch added the kokoro:run label Jul 20, 2020

protobuf-kokoro removed the kokoro:run label Jul 20, 2020

deannagarcia closed this Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize writing strings across multiple buffers #7663

Optimize writing strings across multiple buffers #7663

JamesNK commented Jun 30, 2020 •

edited

JamesNK commented Jul 1, 2020

JamesNK commented Jul 7, 2020

JamesNK Jul 8, 2020 •

edited

jtattermusch Jul 8, 2020

JamesNK Jul 9, 2020

jtattermusch Jul 20, 2020

JamesNK Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

jtattermusch Jul 20, 2020

deannagarcia commented Oct 14, 2021

Optimize writing strings across multiple buffers #7663

Optimize writing strings across multiple buffers #7663

Conversation

JamesNK commented Jun 30, 2020 • edited

JamesNK commented Jul 1, 2020

JamesNK commented Jul 7, 2020

JamesNK Jul 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deannagarcia commented Oct 14, 2021

JamesNK commented Jun 30, 2020 •

edited

JamesNK Jul 8, 2020 •

edited