Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: add span_based_serialization design #6874

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
114 changes: 114 additions & 0 deletions csharp/span_based_serialization.md
@@ -0,0 +1,114 @@
# More efficient Protobuf C# serialization/deserialization API

## Background
Currently serialization of protobuf C# messages is done (much like java) through instances of CodedInputStream and CodedOutputStream.
These APIs work well, but there is some inherent overhead due to the fact that they are basically `byte[]` based. The way .NET gRPC uses these types involves allocating an array the size of the message, which for larger messages means allocations to the [large object heap](https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap#loh-performance-implications).
Recently a set of new types has been added to the .NET Framework (Span, ReadOnlySequence, IBufferWriter)
and these new types allow defining new serialization/deserialization APIs, ones that would be much more efficient especially from
the perspective of memory management.

Both gRPC C# implementations (Grpc.Core and grpc-dotnet) already contain functionality that allows exposing request/response payloads
in terms of the new .NET Span APIs in a way that requires no copying of binary payloads and basically allows deserialization directly
from the buffer segments that were received from the network (and serializing directly into the buffer that is going to be sent out
over the network).

## Objective
The goals for this effort:
- keep the existing CodedInputStream and CodedOutputStream (no plans to remove them in the future), serialization/deserialization speed of these APIs should stay unaffected.
- all the changes made need to be backward compatible
- introduce a set of new public APIs for serialization/deserialization that utilize the Span type
- allow future incremental performance optimizations and make them as easy as possible
- ensure there are microbenchmark that allow comparing the two implementations
- as support for the new serialization/deserialization APIs requires generating code that is not compatible with some older .NET frameworks, figure out a schema under which messages with "new serialization API support" and without it can coexist (without causing user friction).
- the effort to maintain the Protobuf C# project should remain approximately the same (should not be negatively affected by introduction of new serialization/deserialization APIs).

Non-goals
- over-optimize the initial version of new serializer/deserializer (we should rather focus on enabling future improvements)
- remove or deprecate the existing serialization/deserialization API

## The Proposal

Key elements:
- Add `CodedInputReader` and `CodedOutputWriter` types (both are `ref structs` and thus can hold an instance of `Span`). These two types implement the low-level wire format semantics in terms of the `Span` type (as opposed to `byte[]` used by CodedInputStream and CodedOutputStream).
- Introduce `IBufferMessage` interface which inherits from `IMessage` and marks protobuf messages that support the new serialization/deserialization API.
- Introduce a new codegen option `--use_buffer_serialization` to enable/disable generating code required by CodedInputReader/CodedInputWriter (the `MergeFrom(ref CodedInputReader)` and `WriteTo(ref CodedOutputWriter output)` methods). This option will default to disabled. gRPC frameworks will likely provide the ability to explicitly control it, e.g. `<Protobuf Include="..\Shared\benchmark_service.proto" Serialization="Buffer" />`, and/or detect the version of .NET being targeted and automatically enable the option.
- Generated code that uses the new IBufferMessage/CodedInputReader/CodedOutputWriter types will be surrounded by `#if !PROTOBUF_DISABLE_BUFFER_SERIALIZATION` defines. That will allow potentially incompatible generated code to be excluded by a project to support multi-targeting scenarios.

jtattermusch marked this conversation as resolved.
Show resolved Hide resolved
A prototype implementation is https://github.com/protocolbuffers/protobuf/pull/5888

## Efficiency improvements

`MergeFrom(ref CodedInputReader)` allows parsing protobuf messages from a `ReadOnlySequence<byte>`, which offers these benefits over the existing `MergeFrom(ref CodedInputStream)` and/or `MergeFrom(byte[] input)`.

- The input buffer does not have to be continguous, but can be formed from several segment/slices. That's a format that naturally matches the format in which the data are received from the network card (= we can parse directly from the memory slices we received from network)

- Allows parsing directly from both managed and native memory buffer in a safe manner with a single API (this is beneficial for Grpc.Core implementation that uses a native layer, so we don't have to copy buffers into managed memory first).

- In grpc-dotnet, the requests can be exposed as `ReadOnlySequence<byte>` in a very efficient manner.

- Buffer segments can be reused once read so that there are no extra memory allocations.

- Access to segments of read only sequence via `ReadOnlySpan<>` is heavily optimized in recent .NET Core releases.

`WriteTo(ref CodedOutputWriter output)` allows serializing protobuf messages to `IBufferWriter<byte>` which has these benefits:

- The output doesn't have to be written to a single continguous chunk of memory, but into smaller buffer segments, which
avoids allocating large blocks of memory, resulting in slow LOH GC collections.

- Allows serializing to both managed and native memory in a safe manner with the same API.

- Buffer segments can be reused once read so that there are no extra memory allocations.

- Allows avoiding any additional copies of data when serializing (in both Grpc.Core and grpc-dotnet).

## Platform support

- the new serialization API will be supported for projects that support at least `netstandard2.0` and allow use of C# 7.2. On newer .NET frameworks, there might be (and likely will be) further performance benefits.

- the new serialization API will be supported on `net45`, but it's mostly for convenience and to minimize the differences between different builds of the library. `netstandard2.0` and higher are the primary targets.

- the new serialization API won't be supported on netstandard1.0 (as it doesn't support the `System.Memory` package which contains several required types). Users will still be able to use the original serialization APIs as always.

## Backwards compatibility

API changes to the runtime library are purely additive.

By default (without `--use_buffer_serialization` options), the generated code will stay the same and won't depend on any of the newly introduced types, so existing users that don't opt-in won't be affected by the new APIs at all.

When `--use_buffer_serialization` is used when generating the code, the code requiring the newly introduces types will
be protected by `PROTOBUF_DISABLE_BUFFER_SERIALIZATION` if-define, so that it can be disabled (needs to be manually set in a project) when needed.

## Concern: Code duplication and maintainability

While the logical structure of the (de)serialization code for both old and new approaches is very similar, due to the different nature of the object holding the parsing state (`class` vs `ref struct`), the parsing primitives need to be defined twice (e.g. ReadRawVarint64 exist in two copies, one under CodedInputStream and one user CodedInputReader) and that adds some extra overhead when:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haberman according to you, how much of a concern is this? Is there a precedent for this within protobuf (having 2 sets of code for serialization/parsing in one language)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this as a transitional state when Gerben was rewriting the parser, but that was just temporary. Java is in a state where there are two very different models for generated code (one table-based), but I'm not sure how much extra runtime come

Looking at the CL, I do worry that this is an awful lot of new code. If I filter out all the generated code, this CL is still +7,000 lines of code, which is worrying.

$ git diff --stat master ':!csharp/src/Google.Protobuf.Test.TestProtos' ':!csharp/src/Google.Protobuf.Benchmarks/WrapperBenchmarkMessages.cs' ':!csharp/src/Google.Protobuf/Reflection/Descriptor.cs' ':!csharp/src/Google.Protobuf/WellKnownTypes' ':!csharp/src/Google.Protobuf.Benchmarks/Benchmarks.cs' ':!csharp/src/Google.Protobuf.Benchmarks/BenchmarkMessage1Proto3.cs' ':!csharp/src/AddressBook/Addressbook.cs'
 .gitignore                                                                                 |    3 +
 Makefile.am                                                                                |   15 +-
 conformance/Makefile.am                                                                    |    1 +
 csharp/generate_protos.sh                                                                  |   11 +-
 csharp/src/Google.Protobuf.Benchmarks/Google.Protobuf.Benchmarks.csproj                    |   10 +-
 csharp/src/Google.Protobuf.Benchmarks/GoogleMessageBenchmark.cs                            |  114 ++++++++++
 csharp/src/Google.Protobuf.Benchmarks/Program.cs                                           |    3 +-
 csharp/src/Google.Protobuf.Benchmarks/SerializationBenchmark.cs                            |  118 +++++++++-
 csharp/src/Google.Protobuf.Benchmarks/SerializationConfig.cs                               |   16 +-
 csharp/src/Google.Protobuf.Benchmarks/WrapperBenchmark.cs                                  |   47 +++-
 csharp/src/Google.Protobuf.Conformance/Conformance.cs                                      |  276 ++++++++++++++++++++++-
 csharp/src/Google.Protobuf.Conformance/Google.Protobuf.Conformance.csproj                  |    5 +
 csharp/src/Google.Protobuf.Conformance/Program.cs                                          |   89 ++++++--
 csharp/src/Google.Protobuf.Test/Buffers/ArrayBufferWriter.cs                               |  215 ++++++++++++++++++
 csharp/src/Google.Protobuf.Test/Buffers/MaxSizeHintBufferWriter.cs                         |   76 +++++++
 csharp/src/Google.Protobuf.Test/Buffers/ReadOnlySequenceFactory.cs                         |  130 +++++++++++
 csharp/src/Google.Protobuf.Test/{CodedInputStreamExtensions.cs => CodedInputExtensions.cs} |   18 +-
 csharp/src/Google.Protobuf.Test/CodedInputReaderTest.cs                                    |  419 ++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedInputStreamTest.cs                                    |  425 ++++++++++------------------------
 csharp/src/Google.Protobuf.Test/CodedInputTestBase.cs                                      |  279 +++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedOutputStreamTest.cs                                   |  398 ++++++++++++++------------------
 csharp/src/Google.Protobuf.Test/CodedOutputTestBase.cs                                     |  294 ++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedOutputWriterTest.cs                                   |  314 ++++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/Collections/RepeatedFieldTest.cs                           |   55 ++++-
 csharp/src/Google.Protobuf.Test/ExtensionSetTest.cs                                        |   14 +-
 csharp/src/Google.Protobuf.Test/GeneratedMessageTest.Proto2.cs                             |   20 +-
 csharp/src/Google.Protobuf.Test/GeneratedMessageTest.cs                                    |  206 +++++++++++++----
 csharp/src/Google.Protobuf.Test/Google.Protobuf.Test.csproj                                |    4 +
 csharp/src/Google.Protobuf.Test/JsonParserTest.cs                                          |    4 +
 csharp/src/Google.Protobuf.Test/JsonTokenizerTest.cs                                       |    4 +
 csharp/src/Google.Protobuf.Test/MessageParsingHelpers.cs                                   |  145 ++++++++++++
 csharp/src/Google.Protobuf.Test/UnknownFieldSetTest.cs                                     |   24 +-
 csharp/src/Google.Protobuf.Test/WellKnownTypes/WrappersTest.cs                             |  150 ++++++------
 csharp/src/Google.Protobuf.Test/testprotos.pb                                              |  Bin 330948 -> 327492 bytes
 csharp/src/Google.Protobuf/ByteString.cs                                                   |    7 +-
 csharp/src/Google.Protobuf/CodedInputReader.cs                                             | 1339 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/CodedOutputWriter.cs                                            |  702 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/Collections/MapField.cs                                         |  121 +++++++++-
 csharp/src/Google.Protobuf/Collections/RepeatedField.cs                                    |   87 ++++++-
 csharp/src/Google.Protobuf/ExtensionRegistry.cs                                            |    4 +-
 csharp/src/Google.Protobuf/ExtensionSet.cs                                                 |   48 +++-
 csharp/src/Google.Protobuf/ExtensionValue.cs                                               |   57 ++++-
 csharp/src/Google.Protobuf/FieldCodec.cs                                                   |  480 ++++++++++++++++++++++++++++++++++++---
 csharp/src/Google.Protobuf/Google.Protobuf.csproj                                          |   21 +-
 csharp/src/Google.Protobuf/IBufferMessage.cs                                               |   57 +++++
 csharp/src/Google.Protobuf/IMessage.cs                                                     |    2 +-
 csharp/src/Google.Protobuf/MessageExtensions.cs                                            |   25 ++
 csharp/src/Google.Protobuf/MessageParser.cs                                                |   81 +++++++
 csharp/src/Google.Protobuf/SequenceReader.cs                                               |  417 ++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/UnknownField.cs                                                 |   53 +++++
 csharp/src/Google.Protobuf/UnknownFieldSet.cs                                              |  105 ++++++++-
 src/google/protobuf/compiler/csharp/csharp_bootstrap_unittest.cc                           |    4 +-
 src/google/protobuf/compiler/csharp/csharp_field_base.cc                                   |    2 +-
 src/google/protobuf/compiler/csharp/csharp_field_base.h                                    |    2 +
 src/google/protobuf/compiler/csharp/csharp_generator.cc                                    |    2 +
 src/google/protobuf/compiler/csharp/csharp_map_field.cc                                    |   13 ++
 src/google/protobuf/compiler/csharp/csharp_map_field.h                                     |    2 +
 src/google/protobuf/compiler/csharp/csharp_message.cc                                      |  159 ++++++++++---
 src/google/protobuf/compiler/csharp/csharp_message.h                                       |    3 +
 src/google/protobuf/compiler/csharp/csharp_message_field.cc                                |   10 +
 src/google/protobuf/compiler/csharp/csharp_message_field.h                                 |    2 +
 src/google/protobuf/compiler/csharp/csharp_options.h                                       |    7 +-
 src/google/protobuf/compiler/csharp/csharp_primitive_field.cc                              |   10 +
 src/google/protobuf/compiler/csharp/csharp_primitive_field.h                               |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_enum_field.cc                          |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_enum_field.h                           |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_message_field.cc                       |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_message_field.h                        |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_primitive_field.cc                     |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_primitive_field.h                      |    2 +
 src/google/protobuf/compiler/csharp/csharp_wrapper_field.cc                                |   32 +++
 src/google/protobuf/compiler/csharp/csharp_wrapper_field.h                                 |    4 +
 72 files changed, 6986 insertions(+), 821 deletions(-)

The doubled generated code seems like a potential concern for code size also. I guess C# probably isn't used much in any kind of mobile (phone) deployment, but probably some people eventually have code size concerns?

A few specific questions:

  1. Once Span is available everywhere, could the old code simply become a client of the new code?
  2. Is it possible that some of the code can be shared by using stateless functions where all input and output is represented as explicit parameters?
  3. Is it necessary to expose the new reader/writer types to the user? Ideally the user could take advantage of the new code without needing to construct a CodedInputReader for example, but could just directly pass the ReadOnlySequence<byte> or whatever that they want to parse from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this as a transitional state when Gerben was rewriting the parser, but that was just temporary. Java is in a state where there are two very different models for generated code (one table-based), but I'm not sure how much extra runtime come

Looking at the CL, I do worry that this is an awful lot of new code. If I filter out all the generated code, this CL is still +7,000 lines of code, which is worrying.

Yes, it's a big PR. Here's some more insights:

  • A lot of the code changes is tests - we've added a lot of new tests and where possible, we modified the tests to test both old and the new parser at the same time (so some line changes in the tests are just moving tests around - e.g. from CodedInputStreamTest to CodedInputTestBase where it can be used to ). Many of the changes in the tests are mechanic - just making sure that both serializer/parser implementations get tested

  • The changes in the codegen plugin (src/google/protobuf/compiler/csharp) are very mechanic and barely have any logic

  • The actual new logic is basically two files - CodedInputReader and CodedOutputWriter ("new" version of CodedInputStream and CodedOuputStream) and adding support for those in things like Maps, RepeatedFields etc. Altogether this is ~2500 lines but CodedInputReader and CodedOuputWriter is modeled after CodedInputStream and CodedOutputStream and the high-level logic stays the same. It's the low-level read from buffer operations and parsing the primitives that's different.

$ git diff --stat master ':!csharp/src/Google.Protobuf.Test.TestProtos' ':!csharp/src/Google.Protobuf.Benchmarks/WrapperBenchmarkMessages.cs' ':!csharp/src/Google.Protobuf/Reflection/Descriptor.cs' ':!csharp/src/Google.Protobuf/WellKnownTypes' ':!csharp/src/Google.Protobuf.Benchmarks/Benchmarks.cs' ':!csharp/src/Google.Protobuf.Benchmarks/BenchmarkMessage1Proto3.cs' ':!csharp/src/AddressBook/Addressbook.cs'
 .gitignore                                                                                 |    3 +
 Makefile.am                                                                                |   15 +-
 conformance/Makefile.am                                                                    |    1 +
 csharp/generate_protos.sh                                                                  |   11 +-
 csharp/src/Google.Protobuf.Benchmarks/Google.Protobuf.Benchmarks.csproj                    |   10 +-
 csharp/src/Google.Protobuf.Benchmarks/GoogleMessageBenchmark.cs                            |  114 ++++++++++
 csharp/src/Google.Protobuf.Benchmarks/Program.cs                                           |    3 +-
 csharp/src/Google.Protobuf.Benchmarks/SerializationBenchmark.cs                            |  118 +++++++++-
 csharp/src/Google.Protobuf.Benchmarks/SerializationConfig.cs                               |   16 +-
 csharp/src/Google.Protobuf.Benchmarks/WrapperBenchmark.cs                                  |   47 +++-
 csharp/src/Google.Protobuf.Conformance/Conformance.cs                                      |  276 ++++++++++++++++++++++-
 csharp/src/Google.Protobuf.Conformance/Google.Protobuf.Conformance.csproj                  |    5 +
 csharp/src/Google.Protobuf.Conformance/Program.cs                                          |   89 ++++++--
 csharp/src/Google.Protobuf.Test/Buffers/ArrayBufferWriter.cs                               |  215 ++++++++++++++++++
 csharp/src/Google.Protobuf.Test/Buffers/MaxSizeHintBufferWriter.cs                         |   76 +++++++
 csharp/src/Google.Protobuf.Test/Buffers/ReadOnlySequenceFactory.cs                         |  130 +++++++++++
 csharp/src/Google.Protobuf.Test/{CodedInputStreamExtensions.cs => CodedInputExtensions.cs} |   18 +-
 csharp/src/Google.Protobuf.Test/CodedInputReaderTest.cs                                    |  419 ++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedInputStreamTest.cs                                    |  425 ++++++++++------------------------
 csharp/src/Google.Protobuf.Test/CodedInputTestBase.cs                                      |  279 +++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedOutputStreamTest.cs                                   |  398 ++++++++++++++------------------
 csharp/src/Google.Protobuf.Test/CodedOutputTestBase.cs                                     |  294 ++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/CodedOutputWriterTest.cs                                   |  314 ++++++++++++++++++++++++++
 csharp/src/Google.Protobuf.Test/Collections/RepeatedFieldTest.cs                           |   55 ++++-
 csharp/src/Google.Protobuf.Test/ExtensionSetTest.cs                                        |   14 +-
 csharp/src/Google.Protobuf.Test/GeneratedMessageTest.Proto2.cs                             |   20 +-
 csharp/src/Google.Protobuf.Test/GeneratedMessageTest.cs                                    |  206 +++++++++++++----
 csharp/src/Google.Protobuf.Test/Google.Protobuf.Test.csproj                                |    4 +
 csharp/src/Google.Protobuf.Test/JsonParserTest.cs                                          |    4 +
 csharp/src/Google.Protobuf.Test/JsonTokenizerTest.cs                                       |    4 +
 csharp/src/Google.Protobuf.Test/MessageParsingHelpers.cs                                   |  145 ++++++++++++
 csharp/src/Google.Protobuf.Test/UnknownFieldSetTest.cs                                     |   24 +-
 csharp/src/Google.Protobuf.Test/WellKnownTypes/WrappersTest.cs                             |  150 ++++++------
 csharp/src/Google.Protobuf.Test/testprotos.pb                                              |  Bin 330948 -> 327492 bytes
 csharp/src/Google.Protobuf/ByteString.cs                                                   |    7 +-
 csharp/src/Google.Protobuf/CodedInputReader.cs                                             | 1339 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/CodedOutputWriter.cs                                            |  702 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/Collections/MapField.cs                                         |  121 +++++++++-
 csharp/src/Google.Protobuf/Collections/RepeatedField.cs                                    |   87 ++++++-
 csharp/src/Google.Protobuf/ExtensionRegistry.cs                                            |    4 +-
 csharp/src/Google.Protobuf/ExtensionSet.cs                                                 |   48 +++-
 csharp/src/Google.Protobuf/ExtensionValue.cs                                               |   57 ++++-
 csharp/src/Google.Protobuf/FieldCodec.cs                                                   |  480 ++++++++++++++++++++++++++++++++++++---
 csharp/src/Google.Protobuf/Google.Protobuf.csproj                                          |   21 +-
 csharp/src/Google.Protobuf/IBufferMessage.cs                                               |   57 +++++
 csharp/src/Google.Protobuf/IMessage.cs                                                     |    2 +-
 csharp/src/Google.Protobuf/MessageExtensions.cs                                            |   25 ++
 csharp/src/Google.Protobuf/MessageParser.cs                                                |   81 +++++++
 csharp/src/Google.Protobuf/SequenceReader.cs                                               |  417 ++++++++++++++++++++++++++++++++++
 csharp/src/Google.Protobuf/UnknownField.cs                                                 |   53 +++++
 csharp/src/Google.Protobuf/UnknownFieldSet.cs                                              |  105 ++++++++-
 src/google/protobuf/compiler/csharp/csharp_bootstrap_unittest.cc                           |    4 +-
 src/google/protobuf/compiler/csharp/csharp_field_base.cc                                   |    2 +-
 src/google/protobuf/compiler/csharp/csharp_field_base.h                                    |    2 +
 src/google/protobuf/compiler/csharp/csharp_generator.cc                                    |    2 +
 src/google/protobuf/compiler/csharp/csharp_map_field.cc                                    |   13 ++
 src/google/protobuf/compiler/csharp/csharp_map_field.h                                     |    2 +
 src/google/protobuf/compiler/csharp/csharp_message.cc                                      |  159 ++++++++++---
 src/google/protobuf/compiler/csharp/csharp_message.h                                       |    3 +
 src/google/protobuf/compiler/csharp/csharp_message_field.cc                                |   10 +
 src/google/protobuf/compiler/csharp/csharp_message_field.h                                 |    2 +
 src/google/protobuf/compiler/csharp/csharp_options.h                                       |    7 +-
 src/google/protobuf/compiler/csharp/csharp_primitive_field.cc                              |   10 +
 src/google/protobuf/compiler/csharp/csharp_primitive_field.h                               |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_enum_field.cc                          |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_enum_field.h                           |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_message_field.cc                       |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_message_field.h                        |    2 +
 src/google/protobuf/compiler/csharp/csharp_repeated_primitive_field.cc                     |   13 ++
 src/google/protobuf/compiler/csharp/csharp_repeated_primitive_field.h                      |    2 +
 src/google/protobuf/compiler/csharp/csharp_wrapper_field.cc                                |   32 +++
 src/google/protobuf/compiler/csharp/csharp_wrapper_field.h                                 |    4 +
 72 files changed, 6986 insertions(+), 821 deletions(-)

The doubled generated code seems like a potential concern for code size also. I guess C# probably isn't used much in any kind of mobile (phone) deployment, but probably some people eventually have code size concerns?

A few specific questions:

  1. Once Span is available everywhere, could the old code simply become a client of the new code?

The problem is that Span<> is a ref struct (a struct that is always passed as a reference - a new concept in C#) and as such it can only live on the stack (which also acts safety measure if you're e.g. accessing native memory). So you cannot have regular classes such CodedInputStream hold an instance of Span - that makes it difficult to retrofit the "old" parser to use the CodedInputReader logic internally (currently not possible without degrading the parser's performance).
At some point in the distant future, we could recommend everyone moving to CodedInputReader and CodedOuputWriter, make CodedInputStream and CodedOuputStream "legacy" and make them use the new parsing logic at the expense of worse performance, but still maintaining API compatibility.

  1. Is it possible that some of the code can be shared by using stateless functions where all input and output is represented as explicit parameters?

We've spent a lot of time investigating this and did as much de-duplication as seemed practical.

  1. Is it necessary to expose the new reader/writer types to the user? Ideally the user could take advantage of the new code without needing to construct a CodedInputReader for example, but could just directly pass the ReadOnlySequence<byte> or whatever that they want to parse from.

The CodedInputReader and CodedOutputWriter types need to be public as they are referenced by the generated code (and they hold the intermediate parsing/serialization state - same idea as CodedInputStream/CodedOuputStream).
Most users won't use these types directly, and would instead call e.g. SomeMessage.Parser.ParseFrom(ReadOnlySequence)


- optimizing the (de)serialization code (two locations need to be updated)
- fixing bugs (some bug might need to be fixed in two locations)
- writing tests (in some cases, tests need to be duplicated, because behavior with both Stream/Buffer serialization need to be tested)

## Integration with generated gRPC stubs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JunTaoLuo @JamesNK we discussed during our sync-up


To use the new (de)serialization API from gRPC, gRPC C# will need to modify the codegen plugin to generate
grpc stubs that utilize the new API.

TODO: add the design for changes to grpc_csharp_plugin codegen plugin.

TODO: what is going to be the flag to select the new serialization API in gRPC stubs?

TODO: add example snippet of the generated marshaller code (current state: https://github.com/grpc/grpc/blob/b07d7f14857d09f05b58ccd0e5f15a6992618f3a/src/csharp/Grpc.Examples/MathGrpc.cs#L30)

TODO: is there a way to autodetect that IBufferMessage serialization API is available and use it automatically (and fallback if it's not available?)

## Other considerations

TODO: design for Grpc.Tools and whether the span-based parsing will be enabled by default

TODO: how will the old generated code coexist with the span-enabled generated code? how will it work across library boundaries (e.g. a client library contains code generated without the support for fast parsing and we want to use that client library).
jtattermusch marked this conversation as resolved.
Show resolved Hide resolved

## References

This proposal is based on previous work / discussions:
- https://github.com/grpc/grpc-dotnet/issues/30
- https://github.com/protocolbuffers/protobuf/issues/3431
- https://github.com/protocolbuffers/protobuf/pull/4988
- https://github.com/jtattermusch/protobuf/pull/8