Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade gRPC to support Apple Silicon #13

Merged
merged 3 commits into from Jul 8, 2022
Merged

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Jul 4, 2022

What changes were proposed in this pull request?

Upgrade gRPC to support Apple Silicon.

Why are the changes needed?

To support build on Apple M1 chips.

Run mvn clean install on master branch

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  2.106 s]
[INFO] Remote Shuffle Service Protocols ................... FAILURE [  2.082 s]
[INFO] Remote Shuffle Service Common ...................... SKIPPED
[INFO] Coordinator ........................................ SKIPPED
[INFO] rss-internal-client ................................ SKIPPED
[INFO] Shuffle Storage .................................... SKIPPED
[INFO] Shuffle Server ..................................... SKIPPED
[INFO] rss-client ......................................... SKIPPED
[INFO] rss-integration-common-test ........................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  4.365 s
[INFO] Finished at: 2022-07-05T10:20:00+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project rss-proto: Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:osx-aarch_64:3.12.0
[ERROR]
[ERROR]   Try downloading the file manually from the project website.
[ERROR]
[ERROR]   Then, install it using the command:
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.12.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR]
[ERROR]   Alternatively, if you host your own repository you can deploy the file there:
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.12.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR]
[ERROR]   Path to dependency:
[ERROR]   	1) org.apache.uniffle:rss-proto:jar:0.6.0-snapshot
[ERROR]   	2) com.google.protobuf:protoc:exe:osx-aarch_64:3.12.0
[ERROR]
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR]
[ERROR] for artifact:
[ERROR]   org.apache.uniffle:rss-proto:jar:0.6.0-snapshot
[ERROR]
[ERROR] from the specified remote repositories:
[ERROR]   aliyun-maven-central (https://maven.aliyun.com/nexus/content/groups/public/, releases=true, snapshots=false),
[ERROR]   apache-repo (https://repository.apache.org/content/repositories/releases, releases=true, snapshots=false),
[ERROR]   aliyun-maven-apache-snapshots (https://maven.aliyun.com/repository/apache-snapshots/, releases=false, snapshots=true)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :rss-proto

Does this PR introduce any user-facing change?

Yes, gRPC, protobuf related dependencies are updated, and developers who use Apple M1 chips can build now.

How was this patch tested?

➜  apache-uniffle git:(proto) java -version
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu 8.62.0.19-CA-macos-aarch64) (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu 8.62.0.19-CA-macos-aarch64) (build 25.332-b09, mixed mode)
mvn clean install -DskipTests
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  1.476 s]
[INFO] Remote Shuffle Service Protocols ................... SUCCESS [  3.966 s]
[INFO] Remote Shuffle Service Common ...................... SUCCESS [  0.451 s]
[INFO] Coordinator ........................................ SUCCESS [  3.589 s]
[INFO] rss-internal-client ................................ SUCCESS [  0.718 s]
[INFO] Shuffle Storage .................................... SUCCESS [  2.137 s]
[INFO] Shuffle Server ..................................... SUCCESS [  7.720 s]
[INFO] rss-client ......................................... SUCCESS [  2.015 s]
[INFO] rss-integration-common-test ........................ SUCCESS [  2.582 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  24.889 s
[INFO] Finished at: 2022-07-04T17:05:26+08:00
[INFO] ------------------------------------------------------------------------

@kaijchen
Copy link
Contributor

kaijchen commented Jul 4, 2022

Hi @pan3793, thanks for the patch.

According to our experience, some tests will fail on Apple Silicon AArch64 JDK.
Please run maven clean package without -DskipTests to check.

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

Run mvn clean install, and everything looks OK. But I don't test compute engine related modules.
Spark claims to support Apple Silicon since 3.3.0. Hadoop does not support yet.

[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  1.204 s]
[INFO] Remote Shuffle Service Protocols ................... SUCCESS [  5.140 s]
[INFO] Remote Shuffle Service Common ...................... SUCCESS [  7.320 s]
[INFO] Coordinator ........................................ SUCCESS [01:06 min]
[INFO] rss-internal-client ................................ SUCCESS [  0.728 s]
[INFO] Shuffle Storage .................................... SUCCESS [ 22.125 s]
[INFO] Shuffle Server ..................................... SUCCESS [01:35 min]
[INFO] rss-client ......................................... SUCCESS [ 21.195 s]
[INFO] rss-integration-common-test ........................ SUCCESS [06:49 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  10:29 min
[INFO] Finished at: 2022-07-04T18:01:11+08:00
[INFO] ------------------------------------------------------------------------

@frankliee
Copy link
Contributor

frankliee commented Jul 4, 2022

Thank for your contribution, but upgrading gRPC is not very good, because client and server in product environment are not easy to be upgraded simultaneously.
In fact, Uniffle can run on Apple M1 by using x86_64 JVM with Rosetta2

@frankliee
Copy link
Contributor

And can you run mvn package -Pspark2 and mvn package -Pmr

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

but upgrading gRPC is not very good, because client and server in product environment are not easy to be upgraded simultaneously

I don't see a causal relationship between them. If you want a seamless upgrade, you should build a compatibility test to make sure your client and server can work w/ each other in different versions, rather than pin a library version and never do upgrading. Actually, in my experience, the larger the upgrade version span, the greater the risk. The upgrading has a clear request to add native support for Apple Silicon, even w/o urgent requests, keeping the library up-to-date is a good practice.

Uniffle can run on Apple M1 by using x86_64 JVM with Rosetta2

In our testing, building Spark using Rosetta2 costs 2x time more than native. Native is always the first choice.

And can you run mvn package -Pspark2 and mvn package -Pmr

Yes, I'm trying. But I think it's not a blocker for merging this one. The CI shows it does not break the current functionalities.

@colinmjj
Copy link

colinmjj commented Jul 4, 2022

Constants.SHUFFLE_SERVER_VERSION should be updated to avoid version mismatch

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

mvn package -Pspark2 passed

[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  1.104 s]
[INFO] Remote Shuffle Service Protocols ................... SUCCESS [  3.248 s]
[INFO] Remote Shuffle Service Common ...................... SUCCESS [  2.861 s]
[INFO] Coordinator ........................................ SUCCESS [01:06 min]
[INFO] rss-internal-client ................................ SUCCESS [  0.190 s]
[INFO] Shuffle Storage .................................... SUCCESS [ 20.286 s]
[INFO] Shuffle Server ..................................... SUCCESS [01:30 min]
[INFO] rss-client ......................................... SUCCESS [ 20.063 s]
[INFO] rss-integration-common-test ........................ SUCCESS [06:42 min]
[INFO] rss-client-spark-common ............................ SUCCESS [ 39.579 s]
[INFO] rss-client-spark2 .................................. SUCCESS [ 28.866 s]
[INFO] rss-integration-spark-common-test .................. SUCCESS [06:02 min]
[INFO] rss-integration-spark2-test ........................ SUCCESS [  9.976 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  17:29 min
[INFO] Finished at: 2022-07-04T20:06:32+08:00
[INFO] ------------------------------------------------------------------------

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

Constants.SHUFFLE_SERVER_VERSION should be updated to avoid version mismatch

The gRPC library upgrading should not break the protocol, and checked https://github.com/grpc/grpc-java/releases, I don't see any breaking change.

@frankliee
Copy link
Contributor

frankliee commented Jul 4, 2022

mvn package -Pspark2 passed

[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  1.104 s]
[INFO] Remote Shuffle Service Protocols ................... SUCCESS [  3.248 s]
[INFO] Remote Shuffle Service Common ...................... SUCCESS [  2.861 s]
[INFO] Coordinator ........................................ SUCCESS [01:06 min]
[INFO] rss-internal-client ................................ SUCCESS [  0.190 s]
[INFO] Shuffle Storage .................................... SUCCESS [ 20.286 s]
[INFO] Shuffle Server ..................................... SUCCESS [01:30 min]
[INFO] rss-client ......................................... SUCCESS [ 20.063 s]
[INFO] rss-integration-common-test ........................ SUCCESS [06:42 min]
[INFO] rss-client-spark-common ............................ SUCCESS [ 39.579 s]
[INFO] rss-client-spark2 .................................. SUCCESS [ 28.866 s]
[INFO] rss-integration-spark-common-test .................. SUCCESS [06:02 min]
[INFO] rss-integration-spark2-test ........................ SUCCESS [  9.976 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  17:29 min
[INFO] Finished at: 2022-07-04T20:06:32+08:00
[INFO] ------------------------------------------------------------------------

What about mvn package -Pmr

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

What about mvn package -Pmr

Running, will update once completed

@pan3793
Copy link
Member Author

pan3793 commented Jul 4, 2022

mvn package -Pmr passed

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Remote Shuffle Service Project Parent POM 0.6.0-snapshot:
[INFO]
[INFO] Remote Shuffle Service Project Parent POM .......... SUCCESS [  1.029 s]
[INFO] Remote Shuffle Service Protocols ................... SUCCESS [  3.097 s]
[INFO] Remote Shuffle Service Common ...................... SUCCESS [  2.993 s]
[INFO] Coordinator ........................................ SUCCESS [01:05 min]
[INFO] rss-internal-client ................................ SUCCESS [  0.144 s]
[INFO] Shuffle Storage .................................... SUCCESS [ 22.415 s]
[INFO] Shuffle Server ..................................... SUCCESS [01:29 min]
[INFO] rss-client ......................................... SUCCESS [ 18.890 s]
[INFO] rss-integration-common-test ........................ SUCCESS [06:45 min]
[INFO] rss-client-mr ...................................... SUCCESS [ 16.360 s]
[INFO] rss-integration-mr-test ............................ SUCCESS [03:11 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13:36 min
[INFO] Finished at: 2022-07-04T20:24:58+08:00
[INFO] ------------------------------------------------------------------------

<guava.version>30.0-jre</guava.version>
<grpc.version>1.47.0</grpc.version>
<gson.version>2.9.0</gson.version>
<guava.version>31.0.1-jre</guava.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we also bump guava and gson version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w/o upgrading gson and guava, it will violate rules defined by maven-enforcer-plugin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc and protobuf should work well w/ the previous version, the dependency upgrading is trivial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to make the changes minimal to avoid unexpected issues.

Copy link
Member Author

@pan3793 pan3793 Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should remove requireUpperBoundDeps rule of maven-enforcer-plugin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal change was tried in Tencent/Firestorm#185, but there were some issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found and fixed some issues in Spark3 IT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to make the changes minimal to avoid unexpected issues.

I think it's actually the minimal change to meet requireUpperBoundDeps requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found and fixed some issues in Spark3 IT

Thanks for digging out this issue, I remember the same issue was encountered during previous attempts.

@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2022

Codecov Report

Merging #13 (9a0414d) into master (37beb1f) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master      #13   +/-   ##
=========================================
  Coverage     56.83%   56.83%           
  Complexity     1204     1204           
=========================================
  Files           152      152           
  Lines          8401     8401           
  Branches        813      813           
=========================================
  Hits           4775     4775           
  Misses         3368     3368           
  Partials        258      258           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37beb1f...9a0414d. Read the comment docs.

@LuciferYang
Copy link
Contributor

@pan3793 Could you show the difference between and after this pr in description?

@pan3793
Copy link
Member Author

pan3793 commented Jul 5, 2022

@pan3793 Could you show the difference between and after this pr in description?

updated

@LuciferYang
Copy link
Contributor

LuciferYang commented Jul 5, 2022

@pan3793

Hm... without this pr, I test the following commands in my Apple Silicon, all commands seem to have succeeded

mvn clean install -Dos.detected.classifier=osx-x86_64
mvn package -Pspark2 -Dos.detected.classifier=osx-x86_64
mvn package -Pmr -Dos.detected.classifier=osx-x86_64

I use native jdk. Will this way be slower than current pr? Is this acceptable?

@pan3793
Copy link
Member Author

pan3793 commented Jul 5, 2022

I think it's a workaround, you are using Rosetta2-based protoc to generate gRPC & protobuf source codes, but use native javac to compile

@kaijchen
Copy link
Contributor

kaijchen commented Jul 5, 2022

I think it's a workaround, you are using Rosetta2-based protoc to generate gRPC & protobuf source codes, but use native javac to compile

You are still using Rosetta2-based protoc even with this patch. See protocolbuffers/protobuf#8557.
However, it's easier for new contributors to build this project without extra maven property.

@@ -60,8 +61,7 @@
<picocli.version>4.5.2</picocli.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<prometheus.simpleclient.version>0.9.0</prometheus.simpleclient.version>
<protobuf.version>3.12.0</protobuf.version>
<protoc.version>3.12.0</protoc.version>
<protobuf.version>3.19.2</protobuf.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems native M1 support was introduced in v3.20.1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, it is from v3.17.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.17.3 is still Rosetta, there is only protoc-3.17.3-osx-x86_64.zip.

In 3.20.1, there is protoc-3.20.1-osx-aarch_64.zip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check the maven repo instead of GitHub release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link, seems they both requires Rosetta 2.

$ file protoc-3.17.3-osx-aarch_64.exe protoc-3.20.1-osx-aarch_64.exe 
protoc-3.17.3-osx-aarch_64.exe: Mach-O 64-bit executable x86_64
protoc-3.20.1-osx-aarch_64.exe: Mach-O 64-bit executable x86_64

Copy link
Member Author

@pan3793 pan3793 Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a funny thing, they just copy and rename the file... both file has the same md5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version is right

➜  apache-uniffle git:(master) file ~/Downloads/protoc-3.21.2-osx-aarch_64.exe
/Users/chengpan/Downloads/protoc-3.21.2-osx-aarch_64.exe: Mach-O 64-bit executable arm64

@colinmjj
Copy link

colinmjj commented Jul 6, 2022

@pan3793 I think it's ok to upgrade the version of gRPC, and do you have check the shaded configuration for client-mr, client-spark2, client-spark3, is there additional package should be shaded with this PR?

@frankliee
Copy link
Contributor

I have tested the compatibility of gRPC client/server.

  1. Higher server with low client is ok.
  2. Low server with higher client is not ok.

@pan3793
@colinmjj

@colinmjj
Copy link

colinmjj commented Jul 6, 2022

2. w server with higher client is not o

@frankliee I think it can be documented and suggest user to have the same version of client/server.

@colinmjj
Copy link

colinmjj commented Jul 6, 2022

All checks have passed, I prefer +1 for this PR.

@jerryshao
Copy link
Contributor

I have tested the compatibility of gRPC client/server.

  1. Higher server with low client is ok.
  2. Low server with higher client is not ok.

@pan3793

I think we should document this to avoid unexpected issues.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

I have tested the compatibility of gRPC client/server.

  1. Higher server with low client is ok.
  2. Low server with higher client is not ok.

@pan3793 @colinmjj

Would you please share your testing code? I test the official hello-world demo, the high version client works well w/ low version server.
Test steps: https://gist.github.com/pan3793/815c20e1fb725c48cfe3838f9651d101

@kaijchen
Copy link
Contributor

kaijchen commented Jul 6, 2022

According to gRPC Versioning Guide, there is only backward compatibility.

@frankliee
Copy link
Contributor

frankliee commented Jul 6, 2022

I have tested the compatibility of gRPC client/server.

  1. Higher server with low client is ok.
  2. Low server with higher client is not ok.

@pan3793 @colinmjj

Would you please share your testing code? I test the official hello-world demo, the high version client works well w/ low version server. Test steps: https://gist.github.com/pan3793/815c20e1fb725c48cfe3838f9651d101

I directly test uniffle client with uniffle server.
To reproduce it, you could just replace rss-client jar (v1.33 and v1.47).

See MRIntegrationTestBase
File file = new File(parentPath, "client-mr/target/shaded");

@LuciferYang
Copy link
Contributor

This pr is fine to me and agree with @jerryshao if there is really a compatibility issue

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

@frankliee I reproduce the failure of MRIntegrationTestBase, I'm sorry I'm not familiar w/ MapReduce. I checked the integration-test/mr/target/surefire-reports/org.apache.uniffle.test.SecondarySortTest-output.txt and don't find the root cause. Would you please share more about it?

@frankliee
Copy link
Contributor

frankliee commented Jul 6, 2022

@frankliee I reproduce the failure of MRIntegrationTestBase, I'm sorry I'm not family w/ MapReduce. I checked the integration-test/mr/target/surefire-reports/org.apache.uniffle.test.SecondarySortTest-output.txt and don't find the root cause. Would you please share more about it?

Both MR and Spark leverage a unified gRPC client. You can also manually debug it by starting a uniffle server (v1.37) with uniffle client(v1.43). Testing hello world client-server examples cannot cover all APIs used by uniffle.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

Where is the "client jar" configuration in Spark IT like File file = new File(parentPath, "client-mr/target/shaded"); in MR IT?

@frankliee
Copy link
Contributor

Where is the "client jar" configuration in Spark IT like File file = new File(parentPath, "client-mr/target/shaded"); in MR IT?

Spark IT does not have this.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

Spark IT does not have this.

Both MR and Spark leverage a unified gRPC client. You can also manually debug it with by starting a uniffle server (v1.37) with uniffle client(v1.43).
Testing hello world client-server examples cannot cover all APIs used by uniffle.

Then it's not easy for me, a newbie to the project, to build ITs that cover all gRPC APIs used by uniffle to verify the cross-version client/server compatibility and find the broken root cause.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

I have tested the compatibility of gRPC client/server.

  1. Higher server with low client is ok.
  2. Low server with higher client is not ok.

@pan3793 @colinmjj

Would you please share your testing code? I test the official hello-world demo, the high version client works well w/ low version server. Test steps: https://gist.github.com/pan3793/815c20e1fb725c48cfe3838f9651d101

I directly test uniffle client with uniffle server. To reproduce it, you could just replace rss-client jar (v1.33 and v1.47).

See MRIntegrationTestBase File file = new File(parentPath, "client-mr/target/shaded");

I try this way again, and the test passed. Please attach your failure stack traces.

@frankliee
Copy link
Contributor

frankliee commented Jul 6, 2022

Spark IT does not have this.

Both MR and Spark leverage a unified gRPC client. You can also manually debug it with by starting a uniffle server (v1.37) with uniffle client(v1.43).
Testing hello world client-server examples cannot cover all APIs used by uniffle.

Then it's not easy for me, a newbie to the project, to build ITs that cover all gRPC APIs used by uniffle to verify the cross-version client/server compatibility and find the broken root cause.

I know this is not easy because getting MR stack is difficult in local.
I am also +1 for this PR, but it is better to notify the client/server should be the same version in the document.

The reproduction is complicated 1) compile this PR to get a new MR client jar, 2) save this jar to another place, 3) switch to the master branch and recompile, 4) replace the jar with the previous one.

@frankliee frankliee closed this Jul 6, 2022
@frankliee frankliee reopened this Jul 6, 2022
@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

I know this is not easy because getting MR stack is difficult in local.

What I mean "not easy“ is to build a cross-version Spark IT that covers all gRPC APIs used by RSS.

The reproduction is complicated ... 3) switch to the master branch and recompile ...

It's clear to me when you said "See MRIntegrationTestBase File file = new File(parentPath, "client-mr/target/shaded");". But for step 3, it's better to switch back to ec00ad9, which is the commit I cut the branch out, the master has gone ahead.

but it is better to notify the client/server should be the same version in the document.

I would prefer to dig out the root cause before documenting the conclusion. My first round of cross-version testing failed but the following rounds were successful. And the first-round failure is because I put the new MR client jar to a temp folder, if there are some temp files deleted by other processes during testing, the test will fail w/

java.io.IOException: /home/chengpan/.docker/run/docker-cli-api.sock: No such file or directory
        at org.apache.uniffle.test.WordCountTest.wordCountTest(WordCountTest.java:66)

And if I put the client-mr jar build w/ new gRPC version to an immutable folder, then IT passed.

I finally found and checked the MR tasks log, and do not see any related stacks.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

According to gRPC Versioning Guide, there is only backward compatibility.

The document says:

Most importantly, a major version increase must not break wire compatibility with other gRPC implementations so that existing gRPC libraries remain fully interoperable.

Backward compatibility can be broken by a minor release if the API affected by the change was marked as EXPERIMENTAL upon its introduction.

I don't see uniffle use gRPC EXPERIMENTAL api.

That means that no backward incompatible protocol changes are allowed: old clients must continue interoperating correctly with new servers and new servers with old clients.

Then I think there is no compatibility issue that will be introduced by upgrading gRPC deps.

@pan3793
Copy link
Member Author

pan3793 commented Jul 6, 2022

and do you have check the shaded configuration for client-mr, client-spark2, client-spark3, is there additional package should be shaded with this PR?

@colinmjj I check the client-mr, I don't think the upgrading introduces any additional package that should be shaded. But I think we have issues with the master branch. e.g. netty and lz4 have native libraries in the jar, the relocation does not consider the native libraries at all.

@frankliee
Copy link
Contributor

I know this is not easy because getting MR stack is difficult in local.

What I mean "not easy“ is to build a cross-version Spark IT that covers all gRPC APIs used by RSS.

The reproduction is complicated ... 3) switch to the master branch and recompile ...

It's clear to me when you said "See MRIntegrationTestBase File file = new File(parentPath, "client-mr/target/shaded");". But for step 3, it's better to switch back to ec00ad9, which is the commit I cut the branch out, the master has gone ahead.

but it is better to notify the client/server should be the same version in the document.

I would prefer to dig out the root cause before documenting the conclusion. My first round of cross-version testing failed but the following rounds were successful. And the first-round failure is because I put the new MR client jar to a temp folder, if there are some temp files deleted by other processes during testing, the test will fail w/

java.io.IOException: /home/chengpan/.docker/run/docker-cli-api.sock: No such file or directory
        at org.apache.uniffle.test.WordCountTest.wordCountTest(WordCountTest.java:66)

And if I put the client-mr jar build w/ new gRPC version to an immutable folder, then IT passed.

I finally found and checked the MR tasks log, and do not see any related stacks.

The compatibility is ok. I am sorry, and I found the wrong jar is copied...

@pan3793
Copy link
Member Author

pan3793 commented Jul 7, 2022

@frankliee thanks for the confirmation. The latest thing is should we wait for gRPC 1.48 which bundles protoc 3.21.1, or get this in first? Because @kaijchen points that protoc-3.17.3 still requires Rosetta 2 (but does not bother developers).
I would prefer to get in this one first to unblock developers who using M1.

@kaijchen
Copy link
Contributor

kaijchen commented Jul 7, 2022

I would prefer to get in this one first to unblock developers who using M1.

+1, thanks for the investigation.

Copy link
Contributor

@jerryshao jerryshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side.

@colinmjj colinmjj merged commit f7c65d4 into apache:master Jul 8, 2022
@colinmjj
Copy link

colinmjj commented Jul 8, 2022

@pan3793 thanks for the contribution

@pan3793
Copy link
Member Author

pan3793 commented Jul 8, 2022

Thanks everyone for reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants