Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE: add span_based_serialization design #6874
DO NOT MERGE: add span_based_serialization design #6874
Changes from 5 commits
41c92c3
1e81823
8ae06c9
bb28883
47bc281
d4242bc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haberman according to you, how much of a concern is this? Is there a precedent for this within protobuf (having 2 sets of code for serialization/parsing in one language)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had this as a transitional state when Gerben was rewriting the parser, but that was just temporary. Java is in a state where there are two very different models for generated code (one table-based), but I'm not sure how much extra runtime come
Looking at the CL, I do worry that this is an awful lot of new code. If I filter out all the generated code, this CL is still +7,000 lines of code, which is worrying.
The doubled generated code seems like a potential concern for code size also. I guess C# probably isn't used much in any kind of mobile (phone) deployment, but probably some people eventually have code size concerns?
A few specific questions:
Span
is available everywhere, could the old code simply become a client of the new code?CodedInputReader
for example, but could just directly pass theReadOnlySequence<byte>
or whatever that they want to parse from.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a big PR. Here's some more insights:
A lot of the code changes is tests - we've added a lot of new tests and where possible, we modified the tests to test both old and the new parser at the same time (so some line changes in the tests are just moving tests around - e.g. from CodedInputStreamTest to CodedInputTestBase where it can be used to ). Many of the changes in the tests are mechanic - just making sure that both serializer/parser implementations get tested
The changes in the codegen plugin (src/google/protobuf/compiler/csharp) are very mechanic and barely have any logic
The actual new logic is basically two files - CodedInputReader and CodedOutputWriter ("new" version of CodedInputStream and CodedOuputStream) and adding support for those in things like Maps, RepeatedFields etc. Altogether this is ~2500 lines but CodedInputReader and CodedOuputWriter is modeled after CodedInputStream and CodedOutputStream and the high-level logic stays the same. It's the low-level read from buffer operations and parsing the primitives that's different.
The problem is that Span<> is a
ref struct
(a struct that is always passed as a reference - a new concept in C#) and as such it can only live on the stack (which also acts safety measure if you're e.g. accessing native memory). So you cannot have regular classes such CodedInputStream hold an instance of Span - that makes it difficult to retrofit the "old" parser to use the CodedInputReader logic internally (currently not possible without degrading the parser's performance).At some point in the distant future, we could recommend everyone moving to CodedInputReader and CodedOuputWriter, make CodedInputStream and CodedOuputStream "legacy" and make them use the new parsing logic at the expense of worse performance, but still maintaining API compatibility.
We've spent a lot of time investigating this and did as much de-duplication as seemed practical.
The CodedInputReader and CodedOutputWriter types need to be public as they are referenced by the generated code (and they hold the intermediate parsing/serialization state - same idea as CodedInputStream/CodedOuputStream).
Most users won't use these types directly, and would instead call e.g. SomeMessage.Parser.ParseFrom(ReadOnlySequence)