Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize writing strings across multiple buffers #7663

Conversation

JamesNK
Copy link
Contributor

@JamesNK JamesNK commented Jun 30, 2020

Cherry picked from #7576 (comment)

Before:

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 213.53 us | 1.9358 us | 1.8107 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  73.99 us | 0.7287 us | 0.6817 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  52.32 us | 0.4014 us | 0.3558 us |      6.3477 |           - |           - |              5040 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  23.72 us | 0.2815 us | 0.2496 us |     15.5945 |           - |           - |             12288 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  10.72 us | 0.2083 us | 0.1949 us |     12.8174 |           - |           - |             10104 B |

After (not caching Encoder):

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 144.75 us | 1.6373 us | 1.5316 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  54.72 us | 0.5721 us | 0.5351 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  30.17 us | 0.1616 us | 0.1433 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  20.02 us | 0.1819 us | 0.1612 us |      6.8054 |           - |           - |              5376 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  13.84 us | 0.1661 us | 0.1554 us |      0.0610 |           - |           - |                56 B |

After (caching Encoder):

|                                               Method | BytesToWrite | encodedSize |      Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           1 | 146.20 us | 2.3256 us | 2.1753 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |           4 |  53.98 us | 0.3705 us | 0.3466 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |          10 |  30.12 us | 0.1255 us | 0.1113 us |           - |           - |           - |                   - |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |         105 |  19.39 us | 0.4198 us | 0.4666 us |      0.0610 |           - |           - |                56 B |
| WriteString_WriteContextBufferWriter_LimitBufferSize |        10080 |       10080 |  13.60 us | 0.0412 us | 0.0344 us |      0.0610 |           - |           - |                56 B |

@JamesNK JamesNK force-pushed the jamesnk/writestring-multisegment branch from 36f0c26 to 500b722 Compare June 30, 2020 08:27
@JamesNK
Copy link
Contributor Author

JamesNK commented Jul 1, 2020

Added to this PR is initializing the initial buffer with IBufferWriter.

Before:

|                                               Method | BytesToWrite | encodedSize |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           1 | 39.56 ns | 0.6020 ns | 0.5631 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           2 | 43.09 ns | 0.7069 ns | 0.6612 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           3 | 45.48 ns | 0.2256 ns | 0.1884 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           4 | 49.91 ns | 0.6544 ns | 0.5801 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           5 | 53.81 ns | 0.9889 ns | 0.8766 ns |           - |           - |           - |                   - |

After:

|                                               Method | BytesToWrite | encodedSize |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------------------------------------- |------------- |------------ |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           1 | 33.21 ns | 0.6996 ns | 0.6544 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           2 | 33.31 ns | 0.2451 ns | 0.2047 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           3 | 35.42 ns | 0.7321 ns | 0.7518 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           4 | 38.52 ns | 0.8109 ns | 0.7585 ns |           - |           - |           - |                   - |
| WriteRawVarint32_WriteContextBufferWriter_FirstWrite |        10080 |           5 | 40.58 ns | 0.6540 ns | 0.5798 ns |           - |           - |           - |                   - |

@JamesNK JamesNK force-pushed the jamesnk/writestring-multisegment branch 2 times, most recently from 03150ce to 38c5018 Compare July 7, 2020 09:08
@JamesNK
Copy link
Contributor Author

JamesNK commented Jul 7, 2020

Rebased to fix merge conflict

@jtattermusch Could you please take a look

{
if (state.writeBufferHelper.codedOutputStream?.InternalOutputStream != null)
{
Debug.Assert(sizeHint == 0, "CodedOutputStream does not support sizeHint.");
Copy link
Contributor Author

@JamesNK JamesNK Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (.NET team) use Debug.Assert to check for things that shouldn't happen, but aren't worth checking at runtime because they aren't caused by external input. I couldn't find any other uses of Debug.Assert outside of test code.

What do you want to do here:

  1. Remove this?
  2. Verify are runtime (check and throw an exception)
  3. Ok to leave as is

// Encoder will keep state of unwritten data.
if (state.stringEncoder == null)
{
state.stringEncoder = Encoding.UTF8.GetEncoder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem with caching the stringEncoder as in the other PR. Not seeing stringEncoder.Reset() anywhere seems suspicious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer to just call Reset() proactively because it would make the reasoning much easier.
Like this if an exception is thrown, the internal state of encoder will be maintained. I know normally throwing would invalidate the entire write context, so this should be fine in theory, but it's hard to predict all the possible patterns of use so I think relying on this is fragile.
How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this if an exception is thrown, the internal state of encoder will be maintained.

Is CodedOutputStream suppose to be safe to use after it throws an exception? We could put it in a try/catch and reset on an error, but it doesn't seem useful if other state (buffer content, position, written output, etc) is in invalid states.

How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

I can test, but the completed result says that the internal buffer has been emptied so Reset would never do anything.

https://docs.microsoft.com/en-us/dotnet/api/system.text.encoder.convert?view=netcore-3.1#System_Text_Encoder_Convert_System_Char__System_Int32_System_Byte__System_Int32_System_Boolean_System_Int32__System_Int32__System_Boolean

The completed output parameter indicates whether all the data in the input buffer was converted and stored in the output buffer.

@JamesNK JamesNK force-pushed the jamesnk/writestring-multisegment branch from bdf2448 to b4b7393 Compare July 9, 2020 04:16
@jtattermusch jtattermusch added the c# label Jul 9, 2020
[Arguments(3)]
[Arguments(4)]
[Arguments(5)]
public void WriteRawVarint32_WriteContextBufferWriter_FirstWrite(int encodedSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this benchmark? it looks like this benchmark doesn't logically belong to this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it, looks like this is related to the WriteBufferHelper.Initialize change below. Ideally would be a separate PR, but since it's already here, let's leave it.

@@ -382,6 +401,23 @@ public void WriteString_WriteContext(int encodedSize)
ctx.CheckNoSpaceLeft();
}

[Benchmark]
[ArgumentsSource(nameof(StringEncodedSizes))]
public void WriteString_WriteContextBufferWriter_LimitBufferSize(int encodedSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : _WriteContextBufferWriter_ -> can be just _BufferWriter_ as "WriteContext" is redundant information here?
Also below.

@@ -71,7 +72,7 @@ public static void Initialize(IBufferWriter<byte> bufferWriter, out WriteBufferH
{
instance.bufferWriter = bufferWriter;
instance.codedOutputStream = null;
buffer = default; // TODO: initialize the initial buffer so that the first write is not via slowpath.
buffer = bufferWriter.GetSpan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this also means that just initializing a writeContext from IBufferWriter will have some non-zero cost (e.g. potentially allocating a buffer). Not sure if there are any scenarios where users create a write context and then decide not to write anything (e.g. because of an error, or because they are actually writing an empty message) but such scenarios would get more expensive by doing this.
Of course under normal circumstances for non-empty messages, this is a good optimization. So we can do it, but let's be aware of the extra overhead for empty writes.

Btw Ideally this change would belong to a separate PR (together with the corresponding benchmark).

buffer[state.position + i] = (byte)value[i];
}
state.position += length;
// String doesn't fit in refreshed buffer. Write across multiple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unfinished sentence in the comment?

if (length == value.Length) // Must be all ASCII...
// String doesn't fit in the remaining buffer.
// Refreshing the buffer could free up enough space to write string.
WriteBufferHelper.RefreshBuffer(ref buffer, ref state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this optimization is pretty speculative and I think maintaining the current invariant of always writing all the way to the end of the current buffer is better for now.
We've just made a big change to how the serialization works internally and the old code has always filled the current buffer to its end so I think it's better be a bit conservative, wait for things to settle down a little bit and only then change invariants things like this. I'm also not convinced that this is always going to be a performance benefit.

Looks like this optimization is pretty much independent of all the other logic in the PR, so I'd like to consider it separately (and also benchmark it separately).

@@ -47,6 +47,7 @@ internal static class WritingPrimitives
{
// "Local" copy of Encoding.UTF8, for efficiency. (Yes, it makes a difference.)
internal static readonly Encoding Utf8Encoding = Encoding.UTF8;
private const int MaximumBytesPerUtf8Char = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right, but it would be good to put some evidence for this in a comment. (e.g. link to the spec or e.g. a similar constant in UtfEncoding.GetMaxByteCount's implementation)

// Encoder will keep state of unwritten data.
if (state.stringEncoder == null)
{
state.stringEncoder = Encoding.UTF8.GetEncoder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer to just call Reset() proactively because it would make the reasoning much easier.
Like this if an exception is thrown, the internal state of encoder will be maintained. I know normally throwing would invalidate the entire write context, so this should be fine in theory, but it's hard to predict all the possible patterns of use so I think relying on this is fragile.
How much more expensive would be to call Reset() proactively each time? (sounds like it should be cheap and if done so there would be absolutely no question if the encoder could be polluted or not).

// Refresh the buffer with a minimum size of the maximum unicode char byte size.
// A minimum buffer size is required because at least one unicode character must
// be written when Encoder.Convert is called.
WriteBufferHelper.RefreshBuffer(ref buffer, ref state, sizeHint: MaximumBytesPerUtf8Char);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of introducing the need for having a sizeHint in refreshbuffer for just this specific corner case.

It sounds like having less than MaximumBytesPerUtf8Char bytes left would be a pretty rare condition, so perhaps if that happens, we could write those bytes into a stackalloc'd buffer and then write the resulting bytes into the destination using WriteRawBytes(span)?

@deannagarcia
Copy link
Member

I'm going to close this request given that there was no response to the last round of comments, but feel free to open again and respond!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants