Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ArgumentException in C# when any proto file with extension is imported multiple times #6954

Closed
wants to merge 2 commits into from

Conversation

owent
Copy link
Contributor

@owent owent commented Nov 27, 2019

Fix ArgumentException: An item with the same key has already been added. Key: Google.Protobuf.ObjectIntPair when using Descriptor with proto file with any proto file with extensions is imported multiple times(imported directly or imported by dependencies proto files).

I think this PR is related to #6936 #6938 . The latest release (by now) v3.11.0 also has this problem.

Here is a simple sample to reproduction this problem:

test-a.proto

syntax = "proto3";

import "test-ext.proto";
import "test-b.proto";

message TestMessageA {
    option (msg_description_1) = "description of TestMessageA";
    
    string id      = 1;
    TestMessageB b = 2;
}

test-b.proto

syntax = "proto3";

import "test-ext.proto";

message TestMessageB {
    option (msg_description_1) = "description of TestMessageB";
    
    string id = 1;
}

test-ext.proto

syntax = "proto3";

import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
    string msg_description_1         = 1001; // description 1
    // other options 2000 to max;
}

Generate csharp code with protoc -I./extensions ./extensions/test-a.proto ./extensions/test-b.proto ./extensions/test-ext.proto ./extensions/google/protobuf/descriptor.proto --csharp_out=.

Ant then run these codes below:

using System;

namespace DotnetCoreConsoleApp2
{
    class Program
    {
        static void Main(string[] args)
        {
            // protoc -I./extensions ./extensions/test-a.proto ./extensions/test-b.proto ./extensions/test-ext.proto ./extensions/google/protobuf/descriptor.proto --csharp_out=.
            var desc = TestAReflection.Descriptor.FindTypeByName<Google.Protobuf.Reflection.MessageDescriptor>("TestMessageA");
            Console.WriteLine(desc.FullName);
        }
    }
}

We will got a exception like this:

Unhandled exception. System.TypeInitializationException: The type initializer for 'TestAReflection' threw an exception.
 ---> System.ArgumentException: An item with the same key has already been added. Key: Google.Protobuf.ObjectIntPair`1[System.Type]
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at Google.Protobuf.ExtensionRegistry.Add(Extension extension) in D:\prebuilt\protobuf\protobuf-3.11.0\csharp\src\Google.Protobuf\ExtensionRegistry.cs:line 83
   at Google.Protobuf.ExtensionRegistry.AddRange(IEnumerable`1 extensions) in D:\prebuilt\protobuf\protobuf-3.11.0\csharp\src\Google.Protobuf\ExtensionRegistry.cs:line 92
   at Google.Protobuf.Reflection.FileDescriptor.AddAllExtensions(FileDescriptor[] dependencies, GeneratedClrTypeInfo generatedInfo, ExtensionRegistry registry) in D:\prebuilt\protobuf\protobuf-3.11.0\csharp\src\Google.Protobuf\Reflection\FileDescriptor.cs:line 451
   at Google.Protobuf.Reflection.FileDescriptor.FromGeneratedCode(Byte[] descriptorData, FileDescriptor[] dependencies, GeneratedClrTypeInfo generatedCodeInfo) in D:\prebuilt\protobuf\protobuf-3.11.0\csharp\src\Google.Protobuf\Reflection\FileDescriptor.cs:line 425
   at TestAReflection..cctor() in D:\workspace\test\DotnetCoreConsoleApp2\TestA.cs:line 29
   --- End of inner exception stack trace ---
   at TestAReflection.get_Descriptor() in D:\workspace\test\DotnetCoreConsoleApp2\TestA.cs:line 18
   at DotnetCoreConsoleApp2.Program.Main(String[] args) in D:\workspace\test\DotnetCoreConsoleApp2\Program.cs:line 10

We find the problem is in the call Google.Protobuf.Reflection.FileDescriptor.AddAllExtensions(...) when generate extensions for test-a.proto . msg_description_1 is added twice from test-a.proto/test-ext.proto/descriptor.proto and test-a.proto/test-b.proto/test-ext.proto/descriptor.proto .

If test-a.proto import test-b.proto and test-c.proto when both test-b.proto and test-c.proto import test-ext.proto with extensions in it. The same problem happens.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@owent
Copy link
Contributor Author

owent commented Nov 27, 2019

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@jtattermusch
Copy link
Contributor

jtattermusch commented Dec 3, 2019

@owt5008137 how is this fix different than #6938 (it looks like it's fixing exactly the same thing)?

@jtattermusch jtattermusch self-requested a review December 3, 2019 13:41
@owent
Copy link
Contributor Author

owent commented Dec 4, 2019

Yes these two PR fix the same problem.
#6938 removed duplicated extensions after analysis all depended files. While this PR remove all
the duplicated depended files first and then pick all extension in them.

For example, when build unittest_with_extension_1_proto3.proto of the new unit test, #6938 will
analysis unittest_extension 3 times (from unittest_with_extension_1_proto3.proto/unittest_import_extension_proto3.proto , unittest_with_extension_1_proto3.proto/unittest_with_extension_2_proto3.proto/unittest_import_extension_proto3.proto and unittest_with_extension_1_proto3.proto/unittest_with_extension_3_proto3.proto/unittest_import_extension_proto3.proto) and get the extension set [unittest_extension, unittest_extension, unittest_extension], and then call Distinct() to get the final depened extension set [unittest_extension].

This PR will remove the duplicated depened file first (unittest_with_extension_1_proto3.proto) and get the extension [unittest_extension] only once.

Finally, I think it's better to use a HashSet to analysis depended files than just call Enumerable.Distinct. With HashSet we can skip any FileDescriptor when they are already built before.

@jtattermusch

@jtattermusch
Copy link
Contributor

#6938 has been merged. If you still want to proceed with this PR, please rebase on top of freshest master and try to align your stuff with what's been done there.

@owent owent force-pushed the master branch 2 times, most recently from 46e5f17 to 42e7d9f Compare December 6, 2019 02:22
@owent
Copy link
Contributor Author

owent commented Dec 6, 2019

#6938 has been merged. If you still want to proceed with this PR, please rebase on top of freshest master and try to align your stuff with what's been done there.

I have rebase onto the newest master branch with #6938 and remove all redundant codes and tests now and just keep the code to remove repeated depended files. Could you review it? @jtattermusch

…ded. Key: Google.Protobuf.ObjectIntPair' when using Descriptor with proto file with any proto file with extensions is imported multiple times(imported directly or imported by dependencies proto files).

Distinct all depended FileDescriptor for extensions
…eep the code to remove repeated depended files.
@MartinKosicky
Copy link

Would be great if someone merged this finally...

return BuildDependedFileDescriptors(dependencies).SelectMany(GetAllDependedExtensions).Distinct(ExtensionRegistry.ExtensionComparer.Instance).Concat(GetAllGeneratedExtensions(generatedInfo));
}

private static IEnumerable<FileDescriptor> BuildDependedFileDescriptors(IEnumerable<FileDescriptor> dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what this method does. Does it get all the unique dependencies recursively?

@jtattermusch
Copy link
Contributor

@owt5008137 @MartinKosicky I might be wrong, but it looks like that the
unittest_issue6936_{a,b,c}.proto files added in https://github.com/protocolbuffers/protobuf/pull/6938/files
are testing for exactly the same scenario as described in this issue (message extension imported multiple times), at least the example .proto files are looking the same.
So I'm not sure if there's still something remaining to fix.

This PR would be easier to comprehend if there was a unit test added that would demonstrate the erroneous and fix behavior.
it is possible to modify unittest_issue6936_{a,b,c}.proto to test for any remaining scenarios?

@owent
Copy link
Contributor Author

owent commented Jun 4, 2020

@owt5008137 @MartinKosicky I might be wrong, but it looks like that the
unittest_issue6936_{a,b,c}.proto files added in https://github.com/protocolbuffers/protobuf/pull/6938/files
are testing for exactly the same scenario as described in this issue (message extension imported multiple times), at least the example .proto files are looking the same.
So I'm not sure if there's still something remaining to fix.

This PR would be easier to comprehend if there was a unit test added that would demonstrate the erroneous and fix behavior.
it is possible to modify unittest_issue6936_{a,b,c}.proto to test for any remaining scenarios?

This RP solve the same problem as #6938 .After #6938 is merged, I rebase it from master again, modify the codes, remove all the duplicated proto files and unit test codes and only left the the optimization of avoiding to build the any dependency multiple times when it's depended by more than one proto files. The origin codes and files of unit tests have the simiular codes to #6938 , so it can use the proto files of #6938 .

@jtattermusch
Copy link
Contributor

@owt5008137 @MartinKosicky I might be wrong, but it looks like that the
unittest_issue6936_{a,b,c}.proto files added in https://github.com/protocolbuffers/protobuf/pull/6938/files
are testing for exactly the same scenario as described in this issue (message extension imported multiple times), at least the example .proto files are looking the same.
So I'm not sure if there's still something remaining to fix.
This PR would be easier to comprehend if there was a unit test added that would demonstrate the erroneous and fix behavior.
it is possible to modify unittest_issue6936_{a,b,c}.proto to test for any remaining scenarios?

This RP solve the same problem as #6938 .After #6938 is merged, I rebase it from master again, modify the codes, remove all the duplicated proto files and unit test codes and only left the the optimization of avoiding to build the any dependency multiple times when it's depended by more than one proto files. The origin codes and files of unit tests have the simiular codes to #6938 , so it can use the proto files of #6938 .

Ok, so you're basically saying the #6938 is already solved and this PR doesn't fix anything that isn't already fixed. It's only an optimization of building the descriptors.
If that's the case I think we can close this PR because I haven't seen evidence that this is a case that needs optimizing (it would need to be backed by some data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants