Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40507: [C++][ORC] Upgrade ORC to 2.0.0 #40508

Merged
merged 5 commits into from
Mar 28, 2024
Merged

Conversation

wgtmac
Copy link
Member

@wgtmac wgtmac commented Mar 13, 2024

Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

Are these changes tested?

Pass the CIs.

Are there any user-facing changes?

No.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 13, 2024

The Apache ORC java library has dropped Java 8 support from v2.0.0. However, I don't think this is an issue here as the orc-core java dependency is only used in the test scope. Am I correct? @lidavidm @jduo

@wgtmac
Copy link
Member Author

wgtmac commented Mar 13, 2024

OK, I'm wrong:

[INFO] Arrow Performance Benchmarks ....................... SKIPPED
[INFO] Arrow Java C Data Interface ........................ SUCCESS [ 57.354 s]
[INFO] Arrow Orc Adapter .................................. FAILURE [ 35.491 s]
[INFO] Arrow Gandiva ...................................... SUCCESS [01:26 min]
[INFO] Arrow Java Dataset ................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  06:03 min (Wall Clock)
[INFO] Finished at: 2024-03-13T15:23:50Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:testCompile (default-testCompile) on project arrow-orc: Compilation failure
Error:  /arrow/java/adapter/orc/src/test/java/org/apache/arrow/adapter/orc/OrcReaderTest.java:[40,21] error: cannot access OrcFile
Error:    bad class file: /root/.m2/repository/org/apache/orc/orc-core/2.0.0/orc-core-2.0.0.jar(/org/apache/orc/OrcFile.class)
Error:      class file has wrong version 61.0, should be 53.0
Error:      Please remove or make sure it appears in the correct subdirectory of the classpath.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :arrow-orc

@lidavidm
Copy link
Member

Hmm, that is test scope, and the other builds pass. So I guess that pipeline uses Java 8 and we could maybe exclude the file?

@lidavidm
Copy link
Member

That said, I think what we've done is to hold off on package updates requiring Java 8 for now, since we're still supporting Java 8...We were trying to drop Java 8 support but some people spoke out against. Maybe we need to revisit that.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 14, 2024

As orc-core is only used in the test scope, perhaps we can leave it to stay at 1.9.x which is the last major version with java 8 support.

@lidavidm
Copy link
Member

I added this to the list at #38051

@lidavidm
Copy link
Member

I think what's going to have to happen is that if someone really needs Java 8 support, they are going to have to maintain the fork or otherwise come up with a strategy, because eventually these CVE fixes pile up and we won't be able to defer the issue any longer

@wgtmac
Copy link
Member Author

wgtmac commented Mar 15, 2024

@github-actions crossbow submit wheel-windows-*

This comment was marked as outdated.

@lidavidm
Copy link
Member

I think we're going to revisit Java 8 in the next 3-6 months so we can include this PR in that effort.

@pitrou
Copy link
Member

pitrou commented Mar 21, 2024

You can probably split the Java and C++ parts of this PR.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wgtmac Can you please remove the Java change so that this PR focusses on the C++ version bump?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 27, 2024
java/adapter/orc/pom.xml Outdated Show resolved Hide resolved
@wgtmac
Copy link
Member Author

wgtmac commented Mar 28, 2024

Just reverted the java module change. Thanks for the reminder! @pitrou

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit -g python -g wheel

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit -g cpp

This comment was marked as outdated.

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

So it seems this breaks the macOS wheel builds for Python 3.8, 3.9 and 3.10 (but not 3.11 and 3.12).
The error is related to a protobuf symbol, and ORC uses protobuf:

ImportError: dlopen(/Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/test-arm64-env/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so, 0x0002): symbol not found in flat namespace '__ZN6google8protobuf2io19ZeroCopyInputStream8ReadCordEPN4absl12lts_202401164CordEi'

The symbol is google::protobuf::io::ZeroCopyInputStream::ReadCord(absl::lts_20240116::Cord*, int).

@wgtmac Is ORC 2.0 building its own protobuf from scratch? Answer: it should not, we're passing PROTOBUF_HOME to ORC's CMake.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

Hmm, weirdly, the 3.10 build succeeded when restarted.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit wheel-macos-*-arm64

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

Okay, I think I've found it:

2024-03-28T13:34:25.3459520Z -- PROTOBUF_HOME: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release
2024-03-28T13:34:25.3460120Z -- Could NOT find Protobuf (missing: Protobuf_DIR)
2024-03-28T13:34:25.3460830Z -- Found the Protobuf headers: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release/include
2024-03-28T13:34:25.3461910Z -- Found the Protobuf library: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release/lib/libprotobuf.a
2024-03-28T13:34:25.3463000Z -- Found the Protoc library: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release/lib/libprotoc.a
2024-03-28T13:34:25.3464060Z -- Found the Protoc executable: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx/tools/protobuf/protoc
2024-03-28T13:34:25.3465220Z -- Found the Protobuf static library: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release/lib/libprotobuf.a
2024-03-28T13:34:25.3466370Z -- Found the Protoc static library: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release/lib/libprotoc.a
2024-03-28T13:33:59.5638710Z -- PROTOBUF_HOME: /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/vcpkg/installed/arm64-osx-static-release
2024-03-28T13:33:59.5643750Z -- Found the Protobuf headers: /opt/homebrew/include
2024-03-28T13:33:59.5644210Z -- Found the Protobuf library: protobuf::libprotobuf
2024-03-28T13:33:59.5644590Z -- Found the Protoc library: protobuf::libprotoc
2024-03-28T13:33:59.5644960Z -- Found the Protoc executable: protobuf::protoc

Note /opt/homebrew/include.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit wheel-macos-*-arm64

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

Yuck, this PR is based on an old version of the git repo...

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit wheel-macos-*-arm64

This comment was marked as outdated.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 28, 2024

So it seems this breaks the macOS wheel builds for Python 3.8, 3.9 and 3.10 (but not 3.11 and 3.12). The error is related to a protobuf symbol, and ORC uses protobuf:

ImportError: dlopen(/Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/test-arm64-env/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so, 0x0002): symbol not found in flat namespace '__ZN6google8protobuf2io19ZeroCopyInputStream8ReadCordEPN4absl12lts_202401164CordEi'

The symbol is google::protobuf::io::ZeroCopyInputStream::ReadCord(absl::lts_20240116::Cord*, int).

@wgtmac Is ORC 2.0 building its own protobuf from scratch? Answer: it should not, we're passing PROTOBUF_HOME to ORC's CMake.

Right, ORC should use what we provide here:

"-DLZ4_HOME=${ORC_LZ4_ROOT}"
"-DPROTOBUF_EXECUTABLE=$<TARGET_FILE:${ARROW_PROTOBUF_PROTOC}>"
"-DPROTOBUF_HOME=${ORC_PROTOBUF_ROOT}"
"-DPROTOBUF_INCLUDE_DIR=$<TARGET_PROPERTY:${ARROW_PROTOBUF_LIBPROTOBUF},INTERFACE_INCLUDE_DIRECTORIES>"
"-DPROTOBUF_LIBRARY=$<TARGET_FILE:${ARROW_PROTOBUF_LIBPROTOBUF}>"
"-DPROTOC_LIBRARY=$<TARGET_FILE:${ARROW_PROTOBUF_LIBPROTOC}>"
"-DSNAPPY_HOME=${ORC_SNAPPY_ROOT}"
"-DSNAPPY_INCLUDE_DIR=${ORC_SNAPPY_INCLUDE_DIR}"
"-DZSTD_HOME=${ORC_ZSTD_ROOT}"
"-DZSTD_INCLUDE_DIR=$<TARGET_PROPERTY:${ARROW_ZSTD_LIBZSTD},INTERFACE_INCLUDE_DIRECTORIES>"
"-DZSTD_LIBRARY=$<TARGET_FILE:${ARROW_ZSTD_LIBZSTD}>")

@pitrou
Copy link
Member

pitrou commented Mar 28, 2024

@github-actions crossbow submit -g python -g wheel -g cpp

Copy link

Revision: 48f6387

Submitted crossbow builds: ursacomputing/crossbow @ actions-33cf00f7e6

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-cpp GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 Azure
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-fedora-39-python-3 Azure
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI results look unrelated now, I'll merge

@pitrou pitrou merged commit 4f39e6e into apache:main Mar 28, 2024
33 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Mar 28, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Mar 28, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Mar 29, 2024

Thank you for the investigation! @pitrou

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 4f39e6e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
### Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

### What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

### Are these changes tested?

Pass the CIs.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40507

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 4, 2024
### Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

### What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

### Are these changes tested?

Pass the CIs.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40507

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
### Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

### What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

### Are these changes tested?

Pass the CIs.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40507

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
### Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

### What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

### Are these changes tested?

Pass the CIs.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40507

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
### Rationale for this change

This PR aims to upgrade to a new major version of Apache ORC: https://orc.apache.org/news/2024/03/08/ORC-2.0.0/

### What changes are included in this PR?

This PR upgrades ORC dependency from 1.9.2 to 2.0.0.

### Are these changes tested?

Pass the CIs.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40507

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants