Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Span-based C# parsing logic #7351

Merged

Conversation

jtattermusch
Copy link
Contributor

@jtattermusch jtattermusch commented Apr 2, 2020

A big refactor of C# parsing logic to allow parsing from ReadOnlySequence (which would allow parsing directly from the data as received on the wire without requiring to copy the buffers and thus reducing the GC pressure a lot).

Main changes:

  • refactor out the parsing logic to helper classes (to avoid any duplication of the parsing logic)
  • store all the parsing state into a struct (currently called ParserInternalState)
  • introduce a ParseContext ref struct that has public methods, but they are only meant to be invoked by the generated code
  • generate a InternalMergeFrom(ref ParseContext ctx) method, which has the actual parsing code (and is only supposed to be called by the generated code, never directly by the users.
  • allow easy and fast conversion from CodedInputStream into a ParseContext (because the parsing state is encapsulated in the ParserInternalState, so bringing the state between the two only requires copying a struct.
  • the original IMessage.MergeFrom(CodedInputStream cis) methods redirects to InternalMergeFrom(ref ParseContext). This transition has some (very small = copying a struct) overhead, but is basically only done once in the entire parsing tree, so that seem acceptable.
  • all the public APIs that take CodedInputStream still behave exactly the same.
  • compatibility with old generated code: if one uses old generated code against a newer runtime, everything will still work, but there might be some performance penalty (see benchmarks). Regenerating the code will speed things up.
  • the public API increase is kept as small as possible. Basically the only new method we provide is Parse.ParseFrom(ReadOnlySequence byte) as entry point for the more efficient parsing. There are a few other additions (parsing maps, repeated fields etc.), but they are only meant to be invoked by the generated code.
  • this PR only addresses memory-efficient parsing. Memory efficient serialization can be added in a followup PR.

This PR drops netstandard1.0 support (which doesn't support the Span type we need) and adds netstandard1.1 instead. The impact on supported platforms should be minimal (almost everyone supports netstandard2.0 these days anyway).

Basically this PR a new and simpler take on #5888 (which is much bigger, introduces much more code duplication and leads to bigger public API increase).

@jtattermusch
Copy link
Contributor Author

CC @JamesNK @JunTaoLuo

this is still WIP and I'll probably do some rebasing, but all this seems to be the minimal PR that adds ReadOnlySequence parsing.

@JamesNK
Copy link
Contributor

JamesNK commented Apr 11, 2020

Changes to fix running on .NET Framework:

JamesNK@5d868ad

csharp/src/Google.Protobuf/ParseContext.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/ParseContext.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/ParseContext.cs Show resolved Hide resolved
/// Internal implementation of merging data from given parse context into this message.
/// Users should never invoke this method directly.
/// </summary>
void MergeFrom_Internal(ref ParseContext ctx);
Copy link
Contributor

@JamesNK JamesNK Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of _Internal suffix.

I'd rather it just be MergeFrom, have this comment warning that this overload of MergeFrom shouldn't be used, and code generation should explicitly implement the interface method.

https://hackernoon.com/hiding-members-via-explicit-interface-implementation-in-c-2c5u3oc1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your plans for message writing? WriteContext and an equivalent method here?

I wonder if IBufferMessage is still the right name for this interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MergeFromInternal and InternalMergeFrom also works.
Explicitly implementing the interface in generated code sounds like a good idea - I'll try that.

For message writing I think we can use a similar approach (which will hopefully require way fewer code changes than parsing). I think serialization could go in as a follow-up PR to make this PR smaller (we can make sure we won't release the intermediate version)

Do you have suggestions for how IBufferMessage should be renamed?

Copy link
Contributor

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from looking through code with test coverage. Need to have some tests with legacy message generation to hit backwards compatibility paths.

csharp/src/Google.Protobuf/SegmentedBufferHelper.cs Outdated Show resolved Hide resolved
}
}

state.totalBytesRetired += state.bufferSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI no tests reach here

csharp/src/Google.Protobuf/ParseContext.cs Show resolved Hide resolved
/// </summary>
internal bool DiscardUnknownFields {
get { return state.DiscardUnknownFields; }
set { state.DiscardUnknownFields = value; }
Copy link
Contributor

@JamesNK JamesNK Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this set ever used? It is, just not used in tests

internal ExtensionRegistry ExtensionRegistry
{
get { return state.ExtensionRegistry; }
set { state.ExtensionRegistry = value; }
Copy link
Contributor

@JamesNK JamesNK Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this set ever used? It is, just not used in tests

csharp/src/Google.Protobuf/ParsingPrimitivesMessages.cs Outdated Show resolved Hide resolved
{
return 0F;
}
int finalBufferPos = state.totalBytesRetired + state.bufferPos + length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI no tests reach here

{
return 0D;
}
int finalBufferPos = state.totalBytesRetired + state.bufferPos + length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI no tests reach here

csharp/src/Google.Protobuf/ParseContext.cs Show resolved Hide resolved
csharp/src/Google.Protobuf/FieldCodec.cs Outdated Show resolved Hide resolved
@jtattermusch
Copy link
Contributor Author

@gerben-s sorry, the commentary on benchmark results was buried in the comment thread:
#7351 (comment)
#7351 (comment)

In short, the results seem to be pretty acceptable, especially because for C# we haven't really been too stringent about C# performance measurements until now (many of the benchmarks have only been added very recently) and we've demonstrated that being able to parse from ReadOnlySequence means a big improvement in terms of avoiding buffer copies and reducing GC pressure (so slight regression for some benchmarks would be trade off worth taking).

@JamesNK
Copy link
Contributor

JamesNK commented Apr 27, 2020

Note that one benefit not shown by the benchmark results is the cost of creating the array when using CodedInputStream. The input stream benchmarks are reading from an already initialized array. In the real world, e.g. gRPC, a framework will need to allocate an array and copy data to it before reading it with CodedInputStream.

Reading from a sequence (the newly added feature) allows parsing a third-party buffer of data, e.g. the request stream. That avoids overhead of allocations, copying and garbage collection.

Copy link
Contributor

@gerben-s gerben-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the best person for c# review on code/style. However the high level structure seems very familiar and good to me. If the benchmarks are good as you say I think this is a great improvement

@jtattermusch
Copy link
Contributor Author

@gerben-s Thanks for the review! James has already reviewed the code from the C# perspective. I'll see if @jskeet has any more comments and if not, I'll consider this ready to merge (upon green tests of course).

@jtattermusch
Copy link
Contributor Author

@jskeet do you have any more comments beyond what's been said above? If you want to take some more time to review this, feel free to - there's no rush, just let me know.

@jtattermusch
Copy link
Contributor Author

@jskeet please let me know if you're planning to review this or not.

@jtattermusch
Copy link
Contributor Author

Test failures seem completely unrelated to C#.

@jtattermusch
Copy link
Contributor Author

Thanks everyone for the reviews and for all the work to get things to this state.

@jtattermusch
Copy link
Contributor Author

FTR @jskeet has discovered a problem with this PR.
Even though our TestProtos project is set to <LangVersion>3</LangVersion> to check if ref struct ParseContext is fine to use in the generated code, the compilation still seems to fail if we actually use an old compiler:

 it doesn't build with a real C# 5 compiler. <LangVersion>5</LangVersion> doesn't pick it up, but fortunately the C# 5 compiler (the last one before Roslyn) is still part of Windows, so I could test it reasonably simply:

> c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc /r:..\Google.Protobuf\bin\debug\net45\google.protobuf.dll *.cs /main:Google.Protobuf.Examples.AddressBook.Program

Errors: 
Addressbook.cs(448,14): error CS0619: 'Google.Protobuf.ParseContext' is obsolete: 'Types with embedded references are not supported in this version of your compiler.'
Addressbook.cs(263,10): error CS0619: 'Google.Protobuf.ParseContext' is obsolete: 'Types with embedded references are not supported in this version of your compiler.'
Addressbook.cs(586,10): error CS0619: 'Google.Protobuf.ParseContext' is obsolete: 'Types with embedded references are not supported in this version of your compiler.'

(The "obsolete" message is unfortunate, but it's an artifact of the way Span/ParseContext and other ref struct are handled by the compiler. That's nothing we've done wrong.)

We have some ideas for how to fix this (it will require an #if) but we're still not sure it it's going to work out (or we will need to revert).

jtattermusch added a commit that referenced this pull request Jun 29, 2020
New Span-based serialization logic (followup for #7351)
bithium pushed a commit to bithium/protobuf that referenced this pull request Sep 4, 2023
…harp_new_parsing

New Span-based C# parsing logic
bithium pushed a commit to bithium/protobuf that referenced this pull request Sep 4, 2023
…ffer_serialization

New Span-based serialization logic (followup for protocolbuffers#7351)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants