Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occupancy improvement for Hash table build #15700

Open
wants to merge 17 commits into
base: branch-24.08
Choose a base branch
from

Conversation

tgujar
Copy link
Contributor

@tgujar tgujar commented May 8, 2024

Description

Implements specialized template dispatch for hash joins and mixed semi joins to fix issue describes in #15502.

At a high level, this PR typedef's some types to void depending on the column types in the row's to avoid high register usage for comparator and hasher operations associated with more involved types (lists, structs, string, ...). This is done by dynamic dispatch on CPU side using std::variant+std::visit and dispatching with a specialized template.

This pattern can later be extended to other joins and also to groupby operation. Any operator using row hasher and row comparator should be able to see and improvement in occupancy for hash table build/probe operation.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented May 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 8, 2024
@tgujar
Copy link
Contributor Author

tgujar commented May 8, 2024

I think the approach of specializing the type dispatcher is very cumbersome and will lead to a lot of code replication. Currently, I have the conditional dispatch working for device_row_hasher but I am unsure if there is a better way to implement this. We could introduce a macro here to generate the code, what do you think?

@PointKernel PointKernel added non-breaking Non-breaking change 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Performance Performance related issue labels May 8, 2024
@PointKernel
Copy link
Member

/ok to test

1 similar comment
@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

PointKernel commented May 14, 2024

@tgujar I've updated the docs to unblock CI. Have you noticed any performance regressions for other use cases? It seems that it improves the performance for mixed join but the performance drops significantly in other cases using row hasher.

Comment on lines 48 to 50
id_to_type<type_id::DECIMAL128>,
id_to_type<type_id::DECIMAL64>,
id_to_type<type_id::DECIMAL32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think decimal types are complex type. They are just a wrapper around some integer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality operator for Decimal will perform scaling which uses exponentiation.

CUDF_HOST_DEVICE inline bool operator==(fixed_point<Rep1, Rad1> const& lhs,

I see a reduction in register usage if I comment out decimal types in #15502. I think we can still decide on the types excluded in the branches later on

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

@tgujar Could you take a look at the failing tests?

@PointKernel
Copy link
Member

/ok to test

1 similar comment
@PointKernel
Copy link
Member

/ok to test

* @throw cudf::logic_error if the input tables were preprocessed to transform any nested children
* columns into integer columns but `PhysicalElementComparator` is not
* @throw cudf::logic_error if the input tables were preprocessed to transform any nested
* children columns into integer columns but `PhysicalElementComparator` is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that a significant number of changes to this file are due to reformatting comments.
Would it be possible to undo those changes? This particular change is certainly not desirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will fix. This was caused because of the clang-format extension on vscode

@davidwendt
Copy link
Contributor

This PR needs to be rebased on branch-24.08.

@tgujar
Copy link
Contributor Author

tgujar commented May 30, 2024

Specializing both the comparator and the hasher drops the register usage to 54 instead of the expected 46 for the mixed semi join case. Investigating why the register pressure is different from commenting out the code paths.
The current plan is to avoid using a macro(as mentioned here) and instead do dynamic dispatch on CPU side using std::variant and std::visit

@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue conda Java Affects Java cuDF API. labels Jun 3, 2024
@tgujar tgujar changed the base branch from branch-24.06 to branch-24.08 June 3, 2024 16:14
__device__ bool operator()(size_type const lhs_element_index,
size_type const rhs_element_index) const noexcept
{
return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to express here that this path is unreachable since for types which are typedef-ed to void in the Id to type map are known to be not present in the row. But for some reason, which I don't know why yet, if I instead use CUDF_UNREACHABLE(...) the register usage increases to 54, instead of the expected 46 in the mixed_semi_join case for integer keys and values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A naive question: have you quantified the performance impact by reducing the register usage from 54 to 46?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done this yet. this would be different for GPU and for joins as well. For e.g on T4, looking at the occupancy calculator, I dont expect any change in perf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the prompt reply. We can proceed with the current and revisit this when wrapping up the PR, if it reduces register usage but has no performance impact, I'm inclined to use CUDF_UNREACHABLE for better readability.

@vyasr vyasr removed ci labels Jun 3, 2024
@github-actions github-actions bot removed Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. labels Jun 4, 2024
build_column_types.insert(col.type().id());
}
for (auto col : probe) {
probe_column_types.insert(col.type().id());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if separation between probe and build column types is required? If I understand correctly left_equality and right_equality should have the same column types. This function previously did not have a check using CUDF_EXPECTS(cudf::have_same_types ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if separation between probe and build column types is required?

Probably not needed. If two tables don't match, build time failures will occur when constructing the two-table comparator.

@tgujar
Copy link
Contributor Author

tgujar commented Jun 6, 2024

I have a question here. Is it preferable that I make the changes to all the join operations in this PR or break them up into different ones?

@PointKernel
Copy link
Member

I have a question here. Is it preferable that I make the changes to all the join operations in this PR or break them up into different ones?

We could just focus on mixed join for this PR. The goal is mainly to evaluate the performance impact and design of the new dispatching method.

@tgujar tgujar marked this pull request as ready for review June 6, 2024 14:56
@tgujar tgujar requested a review from a team as a code owner June 6, 2024 14:56
@tgujar tgujar requested review from mhaseeb123 and vuule June 6, 2024 14:56
@tgujar
Copy link
Contributor Author

tgujar commented Jun 6, 2024

Benchmark results. MR adds specialized dispatch for build and probe in case of hash joins, and only for build in case of mixed semi/anti joins. Other joins are not modified

# inner_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 112.437 us |       3.93% | 107.061 us |       4.98% |    -5.376 us |  -4.78% |   FAIL   |
|  I32  |     0      |   100000    |     1000     | 135.776 us |       2.02% | 128.506 us |       1.90% |    -7.270 us |  -5.35% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.058 ms |       0.46% |   2.167 ms |       0.45% |  -890.462 us | -29.12% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 156.405 us |       2.20% | 145.498 us |       1.21% |   -10.907 us |  -6.97% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.631 ms |       0.12% |   2.531 ms |       0.16% | -1100.242 us | -30.30% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.655 ms |       0.06% |   5.073 ms |       0.09% | -1581.481 us | -23.77% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 122.827 us |       1.43% | 124.232 us |       1.44% |     1.405 us |   1.14% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 139.361 us |       1.17% | 137.219 us |       3.06% |    -2.142 us |  -1.54% |   FAIL   |
|  I32  |     1      |  10000000   |     1000     |   1.977 ms |       0.21% |   1.354 ms |       0.34% |  -622.759 us | -31.51% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 144.414 us |       1.21% | 143.193 us |       2.52% |    -1.221 us |  -0.85% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   2.163 ms |       0.13% |   1.473 ms |       0.43% |  -690.769 us | -31.93% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.260 ms |       0.12% |   2.253 ms |       0.24% | -1006.706 us | -30.88% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 114.109 us |       3.34% | 105.741 us |       3.71% |    -8.368 us |  -7.33% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 136.939 us |       2.24% | 131.708 us |       1.55% |    -5.230 us |  -3.82% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.146 ms |       0.56% |   2.216 ms |       0.45% |  -929.616 us | -29.55% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 156.054 us |       1.20% | 146.700 us |       2.20% |    -9.354 us |  -5.99% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   3.715 ms |       0.14% |   2.581 ms |       0.17% | -1134.293 us | -30.53% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   6.750 ms |       0.07% |   5.131 ms |       0.08% | -1618.389 us | -23.98% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 123.094 us |       1.38% | 124.900 us |       1.39% |     1.805 us |   1.47% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 141.180 us |       1.30% | 137.238 us |       3.05% |    -3.942 us |  -2.79% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     |   2.019 ms |       0.09% |   1.397 ms |       0.28% |  -622.010 us | -30.81% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 143.681 us |       1.33% | 144.351 us |       1.42% |     0.671 us |   0.47% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   2.219 ms |       0.10% |   1.516 ms |       0.27% |  -703.369 us | -31.69% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.333 ms |       0.14% |   2.307 ms |       0.24% | -1025.560 us | -30.77% |   FAIL   |

# left_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 111.987 us |       1.73% | 110.068 us |       3.30% |    -1.919 us |  -1.71% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 138.868 us |       3.10% | 128.825 us |       1.13% |   -10.043 us |  -7.23% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.241 ms |       0.53% |   2.276 ms |       0.50% |  -965.407 us | -29.79% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 157.198 us |       1.08% | 145.382 us |       1.09% |   -11.816 us |  -7.52% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.808 ms |       0.15% |   2.633 ms |       0.17% | -1174.401 us | -30.84% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.859 ms |       0.06% |   5.204 ms |       0.08% | -1655.029 us | -24.13% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 122.560 us |       1.50% | 124.198 us |       1.25% |     1.638 us |   1.34% |   FAIL   |
|  I32  |     1      |   100000    |     1000     | 139.765 us |       1.22% | 139.785 us |       2.12% |     0.020 us |   0.01% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.145 ms |       0.14% |   1.480 ms |       0.16% |  -664.832 us | -31.00% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 144.442 us |       1.29% | 144.435 us |       1.57% |    -0.007 us |  -0.00% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   2.320 ms |       0.19% |   1.596 ms |       0.32% |  -723.805 us | -31.20% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.452 ms |       0.12% |   2.403 ms |       0.18% | -1048.513 us | -30.37% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 112.913 us |       1.46% | 108.610 us |       3.88% |    -4.303 us |  -3.81% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 142.333 us |       2.77% | 131.782 us |       1.10% |   -10.551 us |  -7.41% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.339 ms |       0.49% |   2.324 ms |       0.55% | -1014.754 us | -30.39% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 156.852 us |       0.97% | 148.785 us |       2.70% |    -8.066 us |  -5.14% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   3.903 ms |       0.30% |   2.692 ms |       0.11% | -1211.272 us | -31.04% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   6.956 ms |       0.06% |   5.262 ms |       0.09% | -1694.028 us | -24.35% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 122.817 us |       1.29% | 124.738 us |       1.30% |     1.921 us |   1.56% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 141.988 us |       1.36% | 141.088 us |       2.96% |    -0.900 us |  -0.63% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.192 ms |       0.20% |   1.527 ms |       0.21% |  -665.648 us | -30.36% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 146.557 us |       2.33% | 144.878 us |       1.12% |    -1.679 us |  -1.15% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   2.383 ms |       0.15% |   1.640 ms |       0.16% |  -743.069 us | -31.19% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.524 ms |       0.13% |   2.461 ms |       0.20% | -1063.082 us | -30.17% |   FAIL   |

# full_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 202.495 us |       1.68% | 202.082 us |       2.38% |    -0.413 us |  -0.20% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 190.106 us |       3.26% | 183.246 us |       1.01% |    -6.861 us |  -3.61% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.942 ms |       0.43% |   2.975 ms |       0.38% |  -967.704 us | -24.55% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 254.050 us |       0.86% | 243.268 us |       0.77% |   -10.781 us |  -4.24% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.963 ms |       0.15% |   2.784 ms |       0.18% | -1179.510 us | -29.76% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   7.499 ms |       0.07% |   5.839 ms |       0.08% | -1659.243 us | -22.13% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 211.467 us |       1.07% | 215.023 us |       0.97% |     3.556 us |   1.68% |   FAIL   |
|  I32  |     1      |   100000    |     1000     | 230.887 us |       1.04% | 231.303 us |       1.21% |     0.416 us |   0.18% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.440 ms |       0.14% |   1.741 ms |       0.16% |  -698.441 us | -28.63% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 244.139 us |       1.79% | 241.811 us |       1.26% |    -2.328 us |  -0.95% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   2.564 ms |       0.19% |   1.836 ms |       0.33% |  -728.032 us | -28.40% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.909 ms |       0.11% |   2.859 ms |       0.17% | -1050.267 us | -26.87% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 203.301 us |       1.10% | 199.310 us |       2.21% |    -3.991 us |  -1.96% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 198.917 us |       2.13% | 187.892 us |       0.97% |   -11.025 us |  -5.54% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.866 ms |       0.38% |   2.860 ms |       0.44% | -1006.472 us | -26.03% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 254.073 us |       0.94% | 247.261 us |       1.67% |    -6.811 us |  -2.68% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   4.039 ms |       0.13% |   2.833 ms |       0.18% | -1205.339 us | -29.85% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.598 ms |       0.13% |   5.899 ms |       0.08% | -1699.405 us | -22.37% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 212.579 us |       1.12% | 215.639 us |       1.05% |     3.059 us |   1.44% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 233.085 us |       1.01% | 233.765 us |       1.77% |     0.680 us |   0.29% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.453 ms |       0.18% |   1.787 ms |       0.21% |  -665.259 us | -27.12% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 241.901 us |       1.29% | 241.377 us |       0.89% |    -0.524 us |  -0.22% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   2.622 ms |       0.15% |   1.878 ms |       0.15% |  -743.917 us | -28.37% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.981 ms |       0.13% |   2.919 ms |       0.19% | -1061.823 us | -26.67% |   FAIL   |

# mixed_inner_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 182.485 us |       1.83% | 180.562 us |       3.29% |  -1.923 us |  -1.05% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 209.398 us |       1.35% | 209.591 us |       1.11% |   0.194 us |   0.09% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   4.271 ms |       0.41% |   4.269 ms |       0.35% |  -2.265 us |  -0.05% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 240.362 us |       2.06% | 237.976 us |       1.21% |  -2.386 us |  -0.99% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.234 ms |       0.10% |   5.242 ms |       0.12% |   7.659 us |   0.15% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   9.089 ms |       0.04% |   9.072 ms |       0.06% | -17.573 us |  -0.19% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 183.076 us |       2.18% | 188.788 us |       2.69% |   5.712 us |   3.12% |   FAIL   |
|  I32  |     1      |   100000    |     1000     | 209.663 us |       1.11% | 212.000 us |       0.94% |   2.337 us |   1.11% |   FAIL   |
|  I32  |     1      |  10000000   |     1000     |   2.745 ms |       0.14% |   2.731 ms |       0.14% | -13.553 us |  -0.49% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 214.241 us |       1.01% | 217.728 us |       0.99% |   3.487 us |   1.63% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   3.127 ms |       0.11% |   3.123 ms |       0.14% |  -3.526 us |  -0.11% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.287 ms |       0.10% |   4.261 ms |       0.10% | -25.713 us |  -0.60% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 189.755 us |       2.19% | 188.822 us |       2.44% |  -0.933 us |  -0.49% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 228.300 us |       1.95% | 227.726 us |       1.19% |  -0.574 us |  -0.25% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   4.731 ms |       0.38% |   4.711 ms |       0.40% | -20.407 us |  -0.43% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 247.229 us |       0.96% | 247.642 us |       0.92% |   0.412 us |   0.17% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.502 ms |       0.10% |   5.508 ms |       0.12% |   5.581 us |   0.10% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   9.276 ms |       0.06% |   9.253 ms |       0.06% | -22.964 us |  -0.25% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 198.411 us |       1.46% | 191.820 us |       4.01% |  -6.591 us |  -3.32% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 213.582 us |       1.40% | 214.831 us |       0.99% |   1.249 us |   0.58% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.819 ms |       0.13% |   2.816 ms |       0.17% |  -3.475 us |  -0.12% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 217.729 us |       1.54% | 218.700 us |       1.03% |   0.971 us |   0.45% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.250 ms |       0.10% |   3.264 ms |       0.09% |  13.259 us |   0.41% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.381 ms |       0.09% |   4.350 ms |       0.10% | -31.040 us |  -0.71% |   FAIL   |

# mixed_left_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 181.258 us |       1.46% | 179.720 us |       1.06% |  -1.538 us |  -0.85% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 211.240 us |       2.40% | 212.913 us |       2.27% |   1.673 us |   0.79% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   4.429 ms |       0.37% |   4.430 ms |       0.42% |   0.906 us |   0.02% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 242.579 us |       1.82% | 239.933 us |       1.89% |  -2.646 us |  -1.09% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.400 ms |       0.10% |   5.408 ms |       0.10% |   8.566 us |   0.16% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   9.276 ms |       0.04% |   9.257 ms |       0.06% | -18.119 us |  -0.20% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 185.553 us |       2.45% | 185.302 us |       1.57% |  -0.251 us |  -0.14% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 210.679 us |       1.00% | 212.486 us |       0.88% |   1.807 us |   0.86% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.832 ms |       0.11% |   2.819 ms |       0.11% | -13.265 us |  -0.47% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 215.185 us |       0.94% | 217.684 us |       0.96% |   2.499 us |   1.16% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   3.209 ms |       0.12% |   3.205 ms |       0.10% |  -4.749 us |  -0.15% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.376 ms |       0.09% |   4.351 ms |       0.10% | -25.938 us |  -0.59% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 187.852 us |       2.39% | 186.474 us |       2.21% |  -1.378 us |  -0.73% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 231.821 us |       1.94% | 230.232 us |       1.98% |  -1.589 us |  -0.69% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   4.823 ms |       0.39% |   4.800 ms |       0.37% | -22.515 us |  -0.47% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 246.690 us |       0.91% | 247.402 us |       0.89% |   0.713 us |   0.29% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.615 ms |       0.08% |   5.621 ms |       0.09% |   5.873 us |   0.10% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   9.422 ms |       0.04% |   9.400 ms |       0.05% | -21.220 us |  -0.23% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 196.076 us |       2.18% | 191.561 us |       3.88% |  -4.515 us |  -2.30% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 214.409 us |       1.26% | 216.247 us |       0.99% |   1.838 us |   0.86% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.909 ms |       0.11% |   2.902 ms |       0.14% |  -6.777 us |  -0.23% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 218.817 us |       1.45% | 220.194 us |       1.08% |   1.376 us |   0.63% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.343 ms |       0.15% |   3.357 ms |       0.12% |  14.374 us |   0.43% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.473 ms |       0.10% |   4.444 ms |       0.08% | -28.838 us |  -0.64% |   FAIL   |

# mixed_full_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 277.374 us |       3.50% | 276.258 us |       0.86% |  -1.116 us |  -0.40% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 271.192 us |       1.98% | 271.640 us |       1.73% |   0.448 us |   0.17% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   4.956 ms |       0.39% |   5.098 ms |       0.30% | 141.729 us |   2.86% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 344.457 us |       1.47% | 343.632 us |       1.46% |  -0.825 us |  -0.24% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.563 ms |       0.09% |   5.573 ms |       0.10% |   9.202 us |   0.17% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   9.921 ms |       0.05% |   9.900 ms |       0.05% | -20.580 us |  -0.21% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 282.076 us |       2.04% | 281.979 us |       1.52% |  -0.097 us |  -0.03% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 307.431 us |       0.83% | 309.947 us |       0.88% |   2.516 us |   0.82% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   3.108 ms |       0.12% |   3.096 ms |       0.10% | -11.565 us |  -0.37% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 318.134 us |       1.08% | 320.594 us |       0.86% |   2.459 us |   0.77% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   3.456 ms |       0.12% |   3.450 ms |       0.09% |  -6.271 us |  -0.18% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.839 ms |       0.10% |   4.815 ms |       0.09% | -24.349 us |  -0.50% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 285.060 us |       1.55% | 280.950 us |       1.34% |  -4.110 us |  -1.44% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 295.759 us |       1.58% | 292.838 us |       1.62% |  -2.921 us |  -0.99% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   5.353 ms |       0.31% |   5.335 ms |       0.37% | -17.438 us |  -0.33% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 349.404 us |       0.72% | 350.908 us |       0.73% |   1.505 us |   0.43% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.772 ms |       0.09% |   5.779 ms |       0.10% |   7.690 us |   0.13% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |  10.063 ms |       0.05% |  10.047 ms |       0.06% | -16.077 us |  -0.16% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 290.786 us |       1.61% | 288.080 us |       2.59% |  -2.705 us |  -0.93% |   PASS   |
|  I64  |     1      |   100000    |     1000     | 311.609 us |       1.04% | 313.071 us |       0.92% |   1.462 us |   0.47% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   3.207 ms |       0.10% |   3.202 ms |       0.16% |  -4.379 us |  -0.14% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 321.441 us |       1.28% | 322.809 us |       0.95% |   1.369 us |   0.43% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.587 ms |       0.11% |   3.601 ms |       0.12% |  14.183 us |   0.40% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.935 ms |       0.08% |   4.906 ms |       0.09% | -28.230 us |  -0.57% |   FAIL   |

# mixed_left_semi_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 163.941 us |       1.35% | 164.092 us |       1.08% |     0.151 us |   0.09% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 186.297 us |       1.72% | 187.596 us |       1.02% |     1.299 us |   0.70% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.890 ms |       0.21% |   1.888 ms |       0.12% |    -2.544 us |  -0.13% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 216.653 us |       0.97% | 209.246 us |       1.01% |    -7.407 us |  -3.42% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   2.206 ms |       0.12% |   2.187 ms |       0.12% |   -19.241 us |  -0.87% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.849 ms |       0.05% |   5.866 ms |       0.07% |  -983.054 us | -14.35% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 178.305 us |       1.36% | 181.066 us |       1.69% |     2.761 us |   1.55% |   FAIL   |
|  I32  |     1      |   100000    |     1000     | 196.463 us |       2.28% | 199.260 us |       2.01% |     2.796 us |   1.42% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   1.469 ms |       0.29% |   1.456 ms |       0.32% |   -13.259 us |  -0.90% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 223.007 us |       1.11% | 217.832 us |       1.04% |    -5.175 us |  -2.32% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   1.518 ms |       0.18% |   1.500 ms |       0.18% |   -17.546 us |  -1.16% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.611 ms |       0.08% |   3.620 ms |       0.09% |  -991.670 us | -21.50% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 167.839 us |       1.14% | 167.587 us |       1.04% |    -0.252 us |  -0.15% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 190.487 us |       2.07% | 189.719 us |       0.98% |    -0.768 us |  -0.40% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   2.076 ms |       0.21% |   2.055 ms |       0.11% |   -21.682 us |  -1.04% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 224.296 us |       1.85% | 212.986 us |       0.97% |   -11.311 us |  -5.04% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   2.353 ms |       0.16% |   2.329 ms |       0.17% |   -23.743 us |  -1.01% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.075 ms |       0.06% |   6.109 ms |       0.06% |  -965.622 us | -13.65% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 186.398 us |       1.55% | 181.063 us |       2.09% |    -5.335 us |  -2.86% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 202.998 us |       1.05% | 198.829 us |       1.95% |    -4.170 us |  -2.05% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     |   1.415 ms |       0.18% |   1.407 ms |       0.29% |    -8.203 us |  -0.58% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 223.472 us |       1.04% | 217.483 us |       0.95% |    -5.989 us |  -2.68% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   1.554 ms |       0.12% |   1.549 ms |       0.13% |    -4.402 us |  -0.28% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.705 ms |       0.08% |   3.695 ms |       0.09% | -1010.020 us | -21.47% |   FAIL   |

# mixed_left_anti_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 163.817 us |       1.10% | 164.318 us |       1.00% |     0.501 us |   0.31% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 186.833 us |       1.14% | 187.803 us |       1.32% |     0.970 us |   0.52% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.899 ms |       0.14% |   1.895 ms |       0.12% |    -4.678 us |  -0.25% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 216.741 us |       0.94% | 209.305 us |       0.99% |    -7.436 us |  -3.43% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   2.214 ms |       0.13% |   2.194 ms |       0.12% |   -19.464 us |  -0.88% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.857 ms |       0.07% |   5.872 ms |       0.06% |  -984.544 us | -14.36% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 178.700 us |       1.35% | 181.034 us |       3.25% |     2.334 us |   1.31% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 197.369 us |       2.02% | 198.717 us |       1.93% |     1.348 us |   0.68% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   1.480 ms |       0.33% |   1.465 ms |       0.33% |   -15.310 us |  -1.03% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 223.360 us |       1.14% | 217.979 us |       1.48% |    -5.382 us |  -2.41% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   1.526 ms |       0.21% |   1.509 ms |       0.17% |   -16.891 us |  -1.11% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.621 ms |       0.09% |   3.628 ms |       0.09% |  -993.276 us | -21.49% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 167.984 us |       1.27% | 167.400 us |       1.20% |    -0.585 us |  -0.35% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 191.022 us |       2.15% | 190.883 us |       1.14% |    -0.138 us |  -0.07% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   2.083 ms |       0.22% |   2.063 ms |       0.11% |   -20.791 us |  -1.00% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 224.755 us |       1.82% | 212.448 us |       1.05% |   -12.307 us |  -5.48% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   2.360 ms |       0.18% |   2.335 ms |       0.12% |   -24.477 us |  -1.04% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.081 ms |       0.05% |   6.118 ms |       0.11% |  -962.945 us | -13.60% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 186.437 us |       1.73% | 181.559 us |       1.92% |    -4.878 us |  -2.62% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 203.248 us |       1.12% | 199.537 us |       2.02% |    -3.711 us |  -1.83% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     |   1.423 ms |       0.19% |   1.417 ms |       0.31% |    -6.537 us |  -0.46% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 223.797 us |       1.10% | 217.751 us |       0.93% |    -6.046 us |  -2.70% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   1.562 ms |       0.17% |   1.559 ms |       0.15% |    -3.259 us |  -0.21% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.714 ms |       0.08% |   3.704 ms |       0.11% | -1009.649 us | -21.42% |   FAIL   |

# distinct_inner_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  76.097 us |       2.56% |  75.916 us |       1.83% | -0.181 us |  -0.24% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  83.753 us |       2.79% |  84.174 us |       1.28% |  0.421 us |   0.50% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.066 ms |       0.29% |   1.065 ms |       0.13% | -1.073 us |  -0.10% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 100.645 us |       1.82% |  99.474 us |       2.91% | -1.171 us |  -1.16% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   1.035 ms |       0.18% |   1.033 ms |       0.27% | -1.659 us |  -0.16% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   3.761 ms |       0.09% |   3.763 ms |       0.09% |  2.090 us |   0.06% |   PASS   |
|  I32  |     1      |    1000     |     1000     |  85.160 us |       2.51% |  86.673 us |       3.48% |  1.513 us |   1.78% |   PASS   |
|  I32  |     1      |   100000    |     1000     |  92.722 us |       1.71% |  93.227 us |       1.50% |  0.505 us |   0.54% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 530.329 us |       0.28% | 537.004 us |       0.30% |  6.675 us |   1.26% |   FAIL   |
|  I32  |     1      |   100000    |    100000    |  92.981 us |       1.59% |  93.298 us |       1.49% |  0.317 us |   0.34% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 587.844 us |       0.30% | 589.881 us |       0.25% |  2.037 us |   0.35% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   1.239 ms |       0.27% |   1.238 ms |       0.29% | -1.266 us |  -0.10% |   PASS   |
|  I64  |     0      |    1000     |     1000     |  75.696 us |       1.47% |  75.792 us |       1.38% |  0.096 us |   0.13% |   PASS   |
|  I64  |     0      |   100000    |     1000     |  84.752 us |       1.37% |  85.872 us |       1.24% |  1.120 us |   1.32% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   1.103 ms |       0.30% |   1.104 ms |       0.18% |  0.672 us |   0.06% |   PASS   |
|  I64  |     0      |   100000    |    100000    |  98.002 us |       3.82% |  98.746 us |       3.56% |  0.744 us |   0.76% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   1.059 ms |       0.33% |   1.061 ms |       0.36% |  1.604 us |   0.15% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   3.789 ms |       0.10% |   3.790 ms |       0.08% |  0.697 us |   0.02% |   PASS   |
|  I64  |     1      |    1000     |     1000     |  84.873 us |       2.24% |  85.373 us |       1.81% |  0.500 us |   0.59% |   PASS   |
|  I64  |     1      |   100000    |     1000     |  93.659 us |       1.96% |  94.275 us |       1.55% |  0.616 us |   0.66% |   PASS   |
|  I64  |     1      |  10000000   |     1000     | 547.495 us |       0.57% | 550.995 us |       0.26% |  3.500 us |   0.64% |   FAIL   |
|  I64  |     1      |   100000    |    100000    |  93.073 us |       1.67% |  93.745 us |       1.55% |  0.671 us |   0.72% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 598.590 us |       0.65% | 600.007 us |       0.41% |  1.417 us |   0.24% |   PASS   |
|  I64  |     1      |  10000000   |   10000000   |   1.258 ms |       0.28% |   1.259 ms |       0.26% |  1.130 us |   0.09% |   PASS   |

# distinct_left_join

## [0] NVIDIA A100 80GB PCIe

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  57.247 us |       1.87% |  57.142 us |       1.89% | -0.106 us |  -0.18% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  60.621 us |       1.74% |  60.351 us |       1.62% | -0.270 us |  -0.44% |   PASS   |
|  I32  |     0      |  10000000   |     1000     | 772.056 us |       0.32% | 770.681 us |       0.19% | -1.376 us |  -0.18% |   PASS   |
|  I32  |     0      |   100000    |    100000    |  72.717 us |       1.99% |  72.192 us |       1.33% | -0.526 us |  -0.72% |   PASS   |
|  I32  |     0      |  10000000   |    100000    | 735.295 us |       0.15% | 734.951 us |       0.15% | -0.344 us |  -0.05% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   3.313 ms |       0.09% |   3.315 ms |       0.08% |  1.946 us |   0.06% |   PASS   |
|  I32  |     1      |    1000     |     1000     |  66.978 us |       1.88% |  67.808 us |       1.77% |  0.829 us |   1.24% |   PASS   |
|  I32  |     1      |   100000    |     1000     |  68.109 us |       1.85% |  68.580 us |       1.78% |  0.471 us |   0.69% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 322.185 us |       0.34% | 323.048 us |       0.35% |  0.863 us |   0.27% |   PASS   |
|  I32  |     1      |   100000    |    100000    |  70.722 us |       1.80% |  71.232 us |       1.76% |  0.510 us |   0.72% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 381.917 us |       0.35% | 382.202 us |       0.36% |  0.286 us |   0.07% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   1.030 ms |       0.36% |   1.029 ms |       0.23% | -1.579 us |  -0.15% |   PASS   |
|  I64  |     0      |    1000     |     1000     |  55.762 us |       1.78% |  55.070 us |       1.79% | -0.693 us |  -1.24% |   PASS   |
|  I64  |     0      |   100000    |     1000     |  59.585 us |       1.60% |  59.233 us |       1.52% | -0.352 us |  -0.59% |   PASS   |
|  I64  |     0      |  10000000   |     1000     | 794.808 us |       0.16% | 795.858 us |       0.16% |  1.050 us |   0.13% |   PASS   |
|  I64  |     0      |   100000    |    100000    |  73.336 us |       2.07% |  72.852 us |       1.85% | -0.485 us |  -0.66% |   PASS   |
|  I64  |     0      |  10000000   |    100000    | 750.184 us |       0.17% | 749.326 us |       0.20% | -0.858 us |  -0.11% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   3.333 ms |       0.08% |   3.333 ms |       0.07% |  0.393 us |   0.01% |   PASS   |
|  I64  |     1      |    1000     |     1000     |  66.907 us |       1.83% |  66.763 us |       1.78% | -0.144 us |  -0.22% |   PASS   |
|  I64  |     1      |   100000    |     1000     |  67.905 us |       1.77% |  68.871 us |       1.80% |  0.966 us |   1.42% |   PASS   |
|  I64  |     1      |  10000000   |     1000     | 336.761 us |       0.35% | 336.860 us |       0.41% |  0.099 us |   0.03% |   PASS   |
|  I64  |     1      |   100000    |    100000    |  71.858 us |       1.87% |  72.272 us |       1.79% |  0.414 us |   0.58% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 395.118 us |       0.39% | 396.264 us |       0.32% |  1.147 us |   0.29% |   PASS   |
|  I64  |     1      |  10000000   |   10000000   |   1.045 ms |       0.21% |   1.046 ms |       0.21% |  0.819 us |   0.08% |   PASS   |

# Summary

- Total Matches: 240
  - Pass    (diff <= min_noise): 99
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  141

@PointKernel
Copy link
Member

/ok to test

@tgujar
Copy link
Contributor Author

tgujar commented Jun 7, 2024

I think as the PR is currently, this should have the breaking label. The type for murmur_device_row_hasher has to be modified in spark-rapids-jni.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

5 participants