Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overcome the ref struct limitation for pre-roslyn compilers by introducing GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE for generated code #7490

Merged

Conversation

jtattermusch
Copy link
Contributor

Followup for #7351.

Attempts to fix the issue mentioned in #7351 (comment)

Introduces the GOOGLE_PROTOBUF_COMPATIBILITY_MODE define which allows to avoid referencing the ParseContext ref struct in the generated code.
The consequences are basically:

  • if the user doesn't have a roslyn-generation compiler, they can still build the generated code (after setting GOOGLE_PROTOBUF_COMPATIBILITY_MODE the generated code ends up being pretty much identical as pre New Span-based C# parsing logic #7351). Another option for the users on older compilers is to simply not regenerate their generated code.
  • there is some performance overhead (because CodedInputStream's parsing methods are now considered legacy). The overhead is removed by getting a newer compiler and recompiling.
  • Folks with older compiler won't be able to use new ParseFrom(ReadOnlySequence<byte> data), because that one requires that the messages implement IBufferMessage ( = support parsing from ParseContext).
  • One important realization is that if the generated code is shipped in a pre-compiled form (e.g. in a nuget package), users with an old compiler will still be able to use the pre-compiled generated code even without the compatibility_mode enabled, as long as they don't touch the APIs utilizing ref struct directly (which they generally don't need to do). E.g. someone with an old compiler can still use the ParseFrom(ReadOnlySequence<byte> data) method as long as the generated code was compiled using a newer compiler in advance. That is a good news for the client libraries.

The differences compared to the old approach #5888:

  • we still avoid duplicating the parsing logic in the runtime
  • the Span based parsing is now supported by default, users with legacy compilers will need to explicitly set the GOOGLE_PROTOBUF_COMPATIBILITY_MODE define to be able to compile the regenerated code.

@jtattermusch
Copy link
Contributor Author

CC @haberman @jskeet @JamesNK

@jtattermusch
Copy link
Contributor Author

For now I've manually tested the generated code with GOOGLE_PROTOBUF_COMPATIBILITY_MODE enabled and everything seems to be working (except the ParseFrom(ReadOnlySequence<byte> data) API of course), but still need to figure out some way to test it automatically.

@jskeet
Copy link
Contributor

jskeet commented May 12, 2020

Looking at the code, I think we can do better in terms of avoiding duplication, by being sneaky with a sort of "bait and switch" approach. Basically we're including the same merging body, but in one method or another based on the compatibility. We can do that while only providing that merging code once.

Current (pre-PR) code:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
      input.ReadRawMessage(this);
    }

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {

    MERGING CODE HERE

    }

Code in this PR:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
      input.ReadRawMessage(this);
    #else
        MERGING CODE HERE
    #endif
    }

    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {

        MERGING CODE HERE

    }
    #endif

How about instead, we do:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
      input.ReadRawMessage(this);
    }
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {
    #endif        

       MERGING CODE HERE
    }

In other words:

  • We always start the MergeFrom method
  • In "regular" mode, we provide a method body, finish that method, then start InternalMergeFrom
  • Then there's the merging code
  • Then we finish whichever method is actually present at this point

This assumes that the merging code is identical between compatibility and non-compatibility mode - that assumption may not be accurate.

@jskeet
Copy link
Contributor

jskeet commented May 12, 2020

Slightly less ugly alternative (a little duplication, but probably clearer):

    #if GOOGLE_PROTOBUF_COMPATIBILITY_MODE
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
    #else
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
      input.ReadRawMessage(this);
    }

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {
    #endif        

       MERGING CODE HERE
    }

@jtattermusch
Copy link
Contributor Author

Looking at the code, I think we can do better in terms of avoiding duplication, by being sneaky with a sort of "bait and switch" approach. Basically we're including the same merging body, but in one method or another based on the compatibility. We can do that while only providing that merging code once.

Current (pre-PR) code:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
      input.ReadRawMessage(this);
    }

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {

    MERGING CODE HERE

    }

Code in this PR:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
      input.ReadRawMessage(this);
    #else
        MERGING CODE HERE
    #endif
    }

    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {

        MERGING CODE HERE

    }
    #endif

How about instead, we do:

    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public void MergeFrom(pb::CodedInputStream input) {
    #if !GOOGLE_PROTOBUF_COMPATIBILITY_MODE
      input.ReadRawMessage(this);
    }
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    void pb::IBufferMessage.InternalMergeFrom(ref pb::ParseContext input) {
    #endif        

       MERGING CODE HERE
    }

In other words:

  • We always start the MergeFrom method
  • In "regular" mode, we provide a method body, finish that method, then start InternalMergeFrom
  • Then there's the merging code
  • Then we finish whichever method is actually present at this point

This assumes that the merging code is identical between compatibility and non-compatibility mode - that assumption may not be accurate.

That's a good point, unfortunately the merging code is very similar but now quite the same. For repeated field, map field, wrappers and for unknown fields and extensions, we need to pass ref input instead of input when using ParseContext.
The good news in terms of compilation speed is that we are always only including one copy of parsing logic (the legacy OR the new version, never both). Assuming the compiler is good at skipping the unimportant #if branches, the compilation speed (and the resulting code size) should remain almost the same. Only the size of the source .cs file is going to be larger.

@jskeet
Copy link
Contributor

jskeet commented May 12, 2020

Rats, that's what I get for not looking at a large enough sample. Thanks for the rapid evaluation :)

@jtattermusch
Copy link
Contributor Author

Alternatively, the define could be named GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE to make its name more specific.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should consider various options for the #define symbol name.

But generally, I'm in favour of this. It's a bit of a pity that those with older compilers will need to take action in order to compiler again (we need really good documentation on this!) but it's probably the best balance.

In terms of testing, we should probably be able to add something to the Windows build to use the C# 5 compiler in c:\Windows\Microsoft.NET\Framework\v4.0.30319 to try to build Google.Protobuf.Test.TestProtos. If it can build that, it should be able to build any generated code.

}

void MapFieldGenerator::GenerateParsingCode(io::Printer* printer, bool use_parse_context) {
if (use_parse_context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly use a conditional operator here instead?

printer->Print(
  variables_,
  use_parse_context
  ? "$name$_.AddEntriesFrom(ref input, _map_$name$_codec);\n"
  : "$name$_.AddEntriesFrom(input, _map_$name$_codec);\n");

That has the benefit that the two versions are on consecutive lines, making the difference obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There are a few others, too.)

@JamesNK
Copy link
Contributor

JamesNK commented May 13, 2020

This seems like a good solution. The cost to anyone working with modern .NET Core is bigger generated code files, but the unused implementation will be removed via #ifdef during compile so there will be no runtime impact.

@haberman
Copy link
Member

Some really nice problem solving here!

My main feedback is:

  • Is it possible to have tests for this?
  • How will we document this for users?

@jskeet
Copy link
Contributor

jskeet commented May 16, 2020

I believe we should be able to write a test for this, only on Windows. I'm happy to try to do that as a separate PR when this is merged. Basically it would try to build the test proto project using the old native C# compiler.

For documentation: I think a combination of release notes and https://developers.google.com/protocol-buffers/docs/csharptutorial is likely to be enough.

@jtattermusch jtattermusch changed the title DO NOT MERGE: Try to overcome the ref struct limitation for pre-roslyn compilers. Overcome the ref struct limitation for pre-roslyn compilers by introducing GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE for generated code May 27, 2020
@jtattermusch
Copy link
Contributor Author

Ok, I believe I addressed all the feedback and this is now ready for final review.

  • the define is not called GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE
  • I added a nunit test that calls an old C# compiler and makes sure it can build the generated code in compatibility mode.

{
using (var process = new Process())
{
process.StartInfo.FileName = "c:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\csc.exe"; // Old C# 5 compiler from .NET framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard coded path is very brittle. Windows could be installed to a different path or drive.

Should look this up from an environmental variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/6660512/how-to-get-csc-exe-path

I think this would work if the test only runs with the net45 target:

using System.Runtime.InteropServices;
var frameworkPath = RuntimeEnvironment.GetRuntimeDirectory();
var cscPath = Path.Combine(frameworkPath, "csc.exe");

Alternatively the %WINDIR% environment variable would be an improvement. To be more exact and lookup the exact path would probably require accessing Windows registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated the test to use the WINDIR env. It's not 100% but seems good enough for our purposes.

@jtattermusch
Copy link
Contributor Author

@jskeet @haberman is there any more feedback? I'd like to merge soon to fix the problem, I have some followup PRs in the queue.

@jskeet
Copy link
Contributor

jskeet commented May 29, 2020

I was fine with it last time I looked; I don't have time to take another careful look at the moment I'm afraid - but if @haberman is happy, that's fine for me.

@haberman
Copy link
Member

This looks great. Could you possibly add something to CHANGES.txt in the "Unreleased Changes" section? I've started doing this so that there is less work to do at release time.

@jtattermusch
Copy link
Contributor Author

@haberman
I updated CHANGES.txt with this change and #7351. I think we can merge this PR now?

@haberman haberman merged commit 7cb5597 into protocolbuffers:master Jun 2, 2020
guendto added a commit to guendto/jomiel-client-demos that referenced this pull request Sep 12, 2020
…MODE

"... make generated code compatible with old C# compilers (pre-roslyn
compilers from .NET framework and old versions of mono) that do not
support ref structs."

See also:
  protocolbuffers/protobuf#7490
  https://github.com/protocolbuffers/protobuf/releases/tag/v3.13.0
@vuqtran88
Copy link

Out of curiosity, how do you enable GOOGLE_PROTOBUF_COMPATIBILITY_MODE ? did you enable it in the .proto file ?

@jtattermusch
Copy link
Contributor Author

GOOGLE_PROTOBUF_COMPATIBILITY_MODE

It's a compile define. You can enable it in your .csproj files. E.g. a similar example here:

<DefineConstants>$(DefineConstants);GOOGLE_PROTOBUF_SUPPORT_FAST_STRING</DefineConstants>

Btw, the correct name of the define is GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE - see history of this PR.

@lirivrod
Copy link

lirivrod commented Feb 1, 2023

Hello was able to successfully compile my generated code using #define GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE
on top of the script however I'm not able to parse data using the Parse.ParseFrom method I know this is already stated here however I wanted to check with you if there are any workarounds or any other method that could be used when using GOOGLE_PROTOBUF_REFSTRUCT_COMPATIBILITY_MODE

@jskeet
Copy link
Contributor

jskeet commented Feb 1, 2023

@lirivrod: Please file a new issue with more details - that's more appropriate than using a merged PR. The expectation isn't that you define the symbol in individual files though, but instead do it for the whole project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants