Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41702: [C++][Parquet] Thrift: generate template method to accelerate reading thrift #41703

Merged

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented May 17, 2024

Rationale for this change

By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster.

What changes are included in this PR?

  1. cpp/build-support/update-thrift.sh: enable templates option to Thirft C++ compilerargument
  2. cpp/src/parquet/thrift_internal.h: use generated code
  3. cpp/src/generated: update generated files.

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

@mapleFU mapleFU marked this pull request as ready for review May 17, 2024 06:53
@mapleFU mapleFU requested a review from wgtmac as a code owner May 17, 2024 06:53
Copy link

⚠️ GitHub issue #41702 has been automatically assigned in GitHub to PR creator.

@mapleFU
Copy link
Member Author

mapleFU commented May 17, 2024

@emkornfield @pitrou I've update a patching here. This generated call less virtual functions during deserializing. Would you mind take a look?

I'm not so familiar with thrift compiler, maybe more useful tools can help deserializing

@pitrou
Copy link
Member

pitrou commented May 17, 2024

@mapleFU I didn't know this was possible. This looks neat in the principle. Did you try to run some benchmark?

@mapleFU
Copy link
Member Author

mapleFU commented May 17, 2024

Run in page index: #41702 (comment)

For footer it's more useful since readVirt is called for more times

@wgtmac
Copy link
Member

wgtmac commented May 17, 2024

I remember there was about 3% speedup reading a sample parquet file.

@pitrou
Copy link
Member

pitrou commented May 21, 2024

Perhaps you can try with the additional benchmarks in #41761

@mapleFU
Copy link
Member Author

mapleFU commented May 21, 2024

On my M1 Pro with Release(O2):

After:

Run on (10 X 24.0711 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 7.98, 10.79, 8.83
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
WriteMetadata/num_columns:1/num_row_groups:1            10248 ns        10198 ns        65596 file_size=459 items_per_second=98.0618k/s
WriteMetadata/num_columns:1/num_row_groups:100         708873 ns       701642 ns         1003 file_size=37.383k items_per_second=1.42523k/s
WriteMetadata/num_columns:1/num_row_groups:1000       7027939 ns      7022677 ns           99 file_size=374.885k items_per_second=142.396/s
WriteMetadata/num_columns:10/num_row_groups:1           78750 ns        78709 ns         8900 file_size=3.762k items_per_second=12.705k/s
WriteMetadata/num_columns:10/num_row_groups:100       6751510 ns      6644838 ns          105 file_size=358.835k items_per_second=150.493/s
WriteMetadata/num_columns:10/num_row_groups:1000     67659713 ns     67142800 ns           10 file_size=3.614M items_per_second=14.8936/s
WriteMetadata/num_columns:100/num_row_groups:1         787280 ns       771871 ns          910 file_size=37.352k items_per_second=1.29555k/s
WriteMetadata/num_columns:100/num_row_groups:100     66632500 ns     66540000 ns           10 file_size=3.61693M items_per_second=15.0286/s
WriteMetadata/num_columns:100/num_row_groups:1000   703455917 ns    699385000 ns            1 file_size=36.2887M items_per_second=1.42983/s
WriteMetadata/num_columns:1000/num_row_groups:1       8089713 ns      8087153 ns           85 file_size=376.655k items_per_second=123.653/s
WriteMetadata/num_columns:1000/num_row_groups:100   705972459 ns    702311000 ns            1 file_size=36.4815M items_per_second=1.42387/s
WriteMetadata/num_columns:10000/num_row_groups:1     82793505 ns     82773750 ns            8 file_size=3.82213M items_per_second=12.0811/s
WriteMetadata/num_columns:10000/num_row_groups:100 7789295000 ns   7492551000 ns            1 file_size=369.089M items_per_second=0.133466/s
ReadMetadata/num_columns:1/num_row_groups:1              3022 ns         3021 ns       229889 file_size=459 items_per_second=330.982k/s
ReadMetadata/num_columns:1/num_row_groups:100           59165 ns        59139 ns        11742 file_size=37.383k items_per_second=16.9092k/s
ReadMetadata/num_columns:1/num_row_groups:1000         587111 ns       586972 ns         1189 file_size=374.885k items_per_second=1.70366k/s
ReadMetadata/num_columns:10/num_row_groups:1            13977 ns        13973 ns        50402 file_size=3.762k items_per_second=71.569k/s
ReadMetadata/num_columns:10/num_row_groups:100         475674 ns       475562 ns         1469 file_size=358.835k items_per_second=2.10278k/s
ReadMetadata/num_columns:10/num_row_groups:1000       4743075 ns      4742237 ns          139 file_size=3.614M items_per_second=210.871/s
ReadMetadata/num_columns:100/num_row_groups:1          119355 ns       119308 ns         5747 file_size=37.352k items_per_second=8.38169k/s
ReadMetadata/num_columns:100/num_row_groups:100       5379931 ns      5378835 ns          133 file_size=3.61693M items_per_second=185.914/s
ReadMetadata/num_columns:100/num_row_groups:1000     58173311 ns     58151000 ns           13 file_size=36.2887M items_per_second=17.1966/s
ReadMetadata/num_columns:1000/num_row_groups:1        1285306 ns      1284195 ns          514 file_size=376.655k items_per_second=778.698/s
ReadMetadata/num_columns:1000/num_row_groups:100     59154014 ns     59110667 ns           12 file_size=36.4815M items_per_second=16.9174/s
ReadMetadata/num_columns:10000/num_row_groups:1      15298734 ns     15288065 ns           46 file_size=3.82213M items_per_second=65.4105/s
ReadMetadata/num_columns:10000/num_row_groups:100   597222875 ns    594531000 ns            1 file_size=369.089M items_per_second=1.682/s

Before:

WriteMetadata/num_columns:1/num_row_groups:1            13997 ns        10952 ns        64411 file_size=459 items_per_second=91.3074k/s
WriteMetadata/num_columns:1/num_row_groups:100        1161928 ns       781421 ns          915 file_size=37.383k items_per_second=1.27972k/s
WriteMetadata/num_columns:1/num_row_groups:1000       9028193 ns      7580868 ns           91 file_size=374.885k items_per_second=131.911/s
WriteMetadata/num_columns:10/num_row_groups:1           87804 ns        81408 ns         8680 file_size=3.762k items_per_second=12.2838k/s
WriteMetadata/num_columns:10/num_row_groups:100       7922727 ns      7032396 ns           96 file_size=358.835k items_per_second=142.199/s
WriteMetadata/num_columns:10/num_row_groups:1000     83557727 ns     72335889 ns            9 file_size=3.614M items_per_second=13.8244/s
WriteMetadata/num_columns:100/num_row_groups:1        1046771 ns       866386 ns          813 file_size=37.352k items_per_second=1.15422k/s
WriteMetadata/num_columns:100/num_row_groups:100     97720995 ns     74290111 ns            9 file_size=3.61693M items_per_second=13.4607/s
WriteMetadata/num_columns:100/num_row_groups:1000  1042585917 ns    773579000 ns            1 file_size=36.2887M items_per_second=1.29269/s
WriteMetadata/num_columns:1000/num_row_groups:1       9320268 ns      8396910 ns           78 file_size=376.655k items_per_second=119.091/s
WriteMetadata/num_columns:1000/num_row_groups:100   789198500 ns    726929000 ns            1 file_size=36.4815M items_per_second=1.37565/s
WriteMetadata/num_columns:10000/num_row_groups:1    105553526 ns     89228125 ns            8 file_size=3.82213M items_per_second=11.2072/s
WriteMetadata/num_columns:10000/num_row_groups:100 9705208125 ns   7941607000 ns            1 file_size=369.089M items_per_second=0.125919/s
ReadMetadata/num_columns:1/num_row_groups:1              3341 ns         3262 ns       215501 file_size=459 items_per_second=306.531k/s
ReadMetadata/num_columns:1/num_row_groups:100           70801 ns        67469 ns        10226 file_size=37.383k items_per_second=14.8215k/s
ReadMetadata/num_columns:1/num_row_groups:1000         697046 ns       661042 ns         1033 file_size=374.885k items_per_second=1.51276k/s
ReadMetadata/num_columns:10/num_row_groups:1            19616 ns        15182 ns        46741 file_size=3.762k items_per_second=65.866k/s
ReadMetadata/num_columns:10/num_row_groups:100         631976 ns       538377 ns         1240 file_size=358.835k items_per_second=1.85743k/s
ReadMetadata/num_columns:10/num_row_groups:1000       5701558 ns      5375484 ns          122 file_size=3.614M items_per_second=186.03/s
ReadMetadata/num_columns:100/num_row_groups:1          137789 ns       128750 ns         5466 file_size=37.352k items_per_second=7.76702k/s
ReadMetadata/num_columns:100/num_row_groups:100       6475114 ns      6090483 ns          118 file_size=3.61693M items_per_second=164.191/s
ReadMetadata/num_columns:100/num_row_groups:1000     64411345 ns     62630000 ns           11 file_size=36.2887M items_per_second=15.9668/s
ReadMetadata/num_columns:1000/num_row_groups:1        1473490 ns      1402757 ns          453 file_size=376.655k items_per_second=712.882/s
ReadMetadata/num_columns:1000/num_row_groups:100     66037220 ns     64025909 ns           11 file_size=36.4815M items_per_second=15.6187/s
ReadMetadata/num_columns:10000/num_row_groups:1      18425749 ns     16564045 ns           44 file_size=3.82213M items_per_second=60.3717/s
ReadMetadata/num_columns:10000/num_row_groups:100   650862958 ns    636789000 ns            1 file_size=369.089M items_per_second=1.57038/s

@mapleFU
Copy link
Member Author

mapleFU commented May 21, 2024

On my AMD 3800X:

Before:

WriteMetadata/num_columns:1/num_row_groups:1            14869 ns        14869 ns        42700 file_size=459 items_per_second=67.2552k/s
WriteMetadata/num_columns:1/num_row_groups:100        1026862 ns      1026848 ns          689 file_size=37.383k items_per_second=973.854/s
WriteMetadata/num_columns:1/num_row_groups:1000       9657576 ns      9656124 ns           72 file_size=374.885k items_per_second=103.561/s
WriteMetadata/num_columns:10/num_row_groups:1          121405 ns       121406 ns         5869 file_size=3.762k items_per_second=8.23686k/s
WriteMetadata/num_columns:10/num_row_groups:100       9488113 ns      9488130 ns           73 file_size=358.835k items_per_second=105.395/s
WriteMetadata/num_columns:10/num_row_groups:1000     98853564 ns     98852700 ns            7 file_size=3.614M items_per_second=10.1161/s
WriteMetadata/num_columns:100/num_row_groups:1        1142870 ns      1142808 ns          629 file_size=37.352k items_per_second=875.037/s
WriteMetadata/num_columns:100/num_row_groups:100     96569070 ns     96568757 ns            7 file_size=3.61693M items_per_second=10.3553/s
WriteMetadata/num_columns:100/num_row_groups:1000  1017437093 ns   1017435400 ns            1 file_size=36.2887M items_per_second=0.982863/s
WriteMetadata/num_columns:1000/num_row_groups:1      11040304 ns     11040197 ns           65 file_size=376.655k items_per_second=90.5781/s
WriteMetadata/num_columns:1000/num_row_groups:100   995932342 ns    995929600 ns            1 file_size=36.4815M items_per_second=1.00409/s
WriteMetadata/num_columns:10000/num_row_groups:1    114961261 ns    114961450 ns            6 file_size=3.82213M items_per_second=8.69857/s
WriteMetadata/num_columns:10000/num_row_groups:100 1.6961e+10 ns   1.6960e+10 ns            1 file_size=369.089M items_per_second=0.0589634/s
ReadMetadata/num_columns:1/num_row_groups:1              6150 ns         6150 ns        95609 file_size=459 items_per_second=162.615k/s
ReadMetadata/num_columns:1/num_row_groups:100          148555 ns       148554 ns         5156 file_size=37.383k items_per_second=6.73154k/s
ReadMetadata/num_columns:1/num_row_groups:1000        1383664 ns      1383603 ns          549 file_size=374.885k items_per_second=722.751/s
ReadMetadata/num_columns:10/num_row_groups:1            31549 ns        31548 ns        16761 file_size=3.762k items_per_second=31.6973k/s
ReadMetadata/num_columns:10/num_row_groups:100        1329978 ns      1329950 ns          486 file_size=358.835k items_per_second=751.908/s
ReadMetadata/num_columns:10/num_row_groups:1000      15798009 ns     15797961 ns           44 file_size=3.614M items_per_second=63.2993/s
ReadMetadata/num_columns:100/num_row_groups:1          297319 ns       297316 ns         2119 file_size=37.352k items_per_second=3.36343k/s
ReadMetadata/num_columns:100/num_row_groups:100      13742747 ns     13742598 ns           49 file_size=3.61693M items_per_second=72.7664/s
ReadMetadata/num_columns:100/num_row_groups:1000    130178737 ns    130176500 ns            5 file_size=36.2887M items_per_second=7.68188/s
ReadMetadata/num_columns:1000/num_row_groups:1        2862534 ns      2862405 ns          260 file_size=376.655k items_per_second=349.357/s
ReadMetadata/num_columns:1000/num_row_groups:100     79884243 ns     79869014 ns            7 file_size=36.4815M items_per_second=12.5205/s
ReadMetadata/num_columns:10000/num_row_groups:1      18818536 ns     18818281 ns           37 file_size=3.82213M items_per_second=53.1398/s
ReadMetadata/num_columns:10000/num_row_groups:100   788936700 ns    788847500 ns            1 file_size=369.089M items_per_second=1.26767/s

After:

WriteMetadata/num_columns:1/num_row_groups:1            14042 ns        14026 ns        48265 file_size=459 items_per_second=71.2951k/s
WriteMetadata/num_columns:1/num_row_groups:100         982543 ns       982545 ns          693 file_size=37.383k items_per_second=1.01776k/s
WriteMetadata/num_columns:1/num_row_groups:1000       9236559 ns      9234951 ns           75 file_size=374.885k items_per_second=108.284/s
WriteMetadata/num_columns:10/num_row_groups:1          115867 ns       115865 ns         6050 file_size=3.762k items_per_second=8.63075k/s
WriteMetadata/num_columns:10/num_row_groups:100       9106303 ns      9106322 ns           77 file_size=358.835k items_per_second=109.814/s
WriteMetadata/num_columns:10/num_row_groups:1000     95039480 ns     95039886 ns            7 file_size=3.614M items_per_second=10.5219/s
WriteMetadata/num_columns:100/num_row_groups:1        1066471 ns      1066474 ns          648 file_size=37.352k items_per_second=937.67/s
WriteMetadata/num_columns:100/num_row_groups:100     92350381 ns     92350900 ns            8 file_size=3.61693M items_per_second=10.8283/s
WriteMetadata/num_columns:100/num_row_groups:1000   972198408 ns    971689600 ns            1 file_size=36.2887M items_per_second=1.02914/s
WriteMetadata/num_columns:1000/num_row_groups:1      10303438 ns     10302799 ns           68 file_size=376.655k items_per_second=97.061/s
WriteMetadata/num_columns:1000/num_row_groups:100   926151272 ns    926026200 ns            1 file_size=36.4815M items_per_second=1.07988/s
WriteMetadata/num_columns:10000/num_row_groups:1    109520337 ns    109283500 ns            6 file_size=3.82213M items_per_second=9.15051/s
WriteMetadata/num_columns:10000/num_row_groups:100 9607536338 ns   9603598900 ns            1 file_size=369.089M items_per_second=0.104128/s
ReadMetadata/num_columns:1/num_row_groups:1              3776 ns         3737 ns       190309 file_size=459 items_per_second=267.588k/s
ReadMetadata/num_columns:1/num_row_groups:100           76296 ns        76114 ns         9217 file_size=37.383k items_per_second=13.1382k/s
ReadMetadata/num_columns:1/num_row_groups:1000         706469 ns       706463 ns          993 file_size=374.885k items_per_second=1.4155k/s
ReadMetadata/num_columns:10/num_row_groups:1            18738 ns        18738 ns        35672 file_size=3.762k items_per_second=53.3679k/s
ReadMetadata/num_columns:10/num_row_groups:100         590179 ns       590180 ns         1202 file_size=358.835k items_per_second=1.6944k/s
ReadMetadata/num_columns:10/num_row_groups:1000       5821858 ns      5821727 ns          123 file_size=3.614M items_per_second=171.77/s
ReadMetadata/num_columns:100/num_row_groups:1          168284 ns       168284 ns         4074 file_size=37.352k items_per_second=5.94234k/s
ReadMetadata/num_columns:100/num_row_groups:100       5752814 ns      5752800 ns          118 file_size=3.61693M items_per_second=173.828/s
ReadMetadata/num_columns:100/num_row_groups:1000     65674677 ns     65672427 ns           11 file_size=36.2887M items_per_second=15.2271/s
ReadMetadata/num_columns:1000/num_row_groups:1        1574680 ns      1574646 ns          444 file_size=376.655k items_per_second=635.063/s
ReadMetadata/num_columns:1000/num_row_groups:100     65989678 ns     65988873 ns           11 file_size=36.4815M items_per_second=15.1541/s
ReadMetadata/num_columns:10000/num_row_groups:1      16967274 ns     16966876 ns           41 file_size=3.82213M items_per_second=58.9384/s
ReadMetadata/num_columns:10000/num_row_groups:100   652885946 ns    652766800 ns            1 file_size=369.089M items_per_second=1.53194/s

@pitrou pitrou force-pushed the templatize-cpp-parquet-deserialize-footer branch from e54e382 to fde772c Compare May 22, 2024 16:15
@pitrou
Copy link
Member

pitrou commented May 22, 2024

@github-actions crossbow submit -g cpp -g wheel

Copy link

Revision: fde772c

Submitted crossbow builds: ursacomputing/crossbow @ actions-bacf49dea9

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @mapleFU

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 22, 2024
@pitrou pitrou merged commit 9ba9253 into apache:main May 22, 2024
35 of 36 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label May 22, 2024
@mapleFU mapleFU deleted the templatize-cpp-parquet-deserialize-footer branch May 22, 2024 17:36
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9ba9253.

There were 5 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…celerate reading thrift (apache#41703)

### Rationale for this change

By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster.

### What changes are included in this PR?

1. `cpp/build-support/update-thrift.sh`: enable `templates` option to Thirft C++ compilerargument
2. `cpp/src/parquet/thrift_internal.h`: use generated code
3. `cpp/src/generated`: update generated files.

### Are these changes tested?

Covered by existing tests.

### Are there any user-facing changes?

No.

* GitHub Issue: apache#41702

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants