-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41702: [C++][Parquet] Thrift: generate template method to accelerate reading thrift #41703
GH-41702: [C++][Parquet] Thrift: generate template method to accelerate reading thrift #41703
Conversation
|
@emkornfield @pitrou I've update a patching here. This generated call less virtual functions during deserializing. Would you mind take a look? I'm not so familiar with thrift compiler, maybe more useful tools can help deserializing |
@mapleFU I didn't know this was possible. This looks neat in the principle. Did you try to run some benchmark? |
Run in page index: #41702 (comment) For footer it's more useful since readVirt is called for more times |
I remember there was about 3% speedup reading a sample parquet file. |
Perhaps you can try with the additional benchmarks in #41761 |
On my M1 Pro with Release(O2): After:
Before:
|
On my AMD 3800X: Before:
After:
|
e54e382
to
fde772c
Compare
@github-actions crossbow submit -g cpp -g wheel |
Revision: fde772c Submitted crossbow builds: ursacomputing/crossbow @ actions-bacf49dea9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @mapleFU
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9ba9253. There were 5 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…celerate reading thrift (apache#41703) ### Rationale for this change By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster. ### What changes are included in this PR? 1. `cpp/build-support/update-thrift.sh`: enable `templates` option to Thirft C++ compilerargument 2. `cpp/src/parquet/thrift_internal.h`: use generated code 3. `cpp/src/generated`: update generated files. ### Are these changes tested? Covered by existing tests. ### Are there any user-facing changes? No. * GitHub Issue: apache#41702 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster.
What changes are included in this PR?
cpp/build-support/update-thrift.sh
: enabletemplates
option to Thirft C++ compilerargumentcpp/src/parquet/thrift_internal.h
: use generated codecpp/src/generated
: update generated files.Are these changes tested?
Covered by existing tests.
Are there any user-facing changes?
No.