Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Support non-equal hash join #5668

Open
3 tasks
lgbo-ustc opened this issue May 9, 2024 · 3 comments · May be fixed by #5735
Open
3 tasks

[CH] Support non-equal hash join #5668

lgbo-ustc opened this issue May 9, 2024 · 3 comments · May be fixed by #5735
Labels
enhancement New feature or request

Comments

@lgbo-ustc
Copy link
Contributor

lgbo-ustc commented May 9, 2024

Description

At present, not supported-join will be fallback, the decesion is made at CHJoinValidateUtil::shouldFallback. After ClickHouse/ClickHouse#60920, we can support more joins

  • For the Join types we support, let‘s convert Spark SMJ to Hash Join,
  • Support non-eqaul join in backend
  • For inner join, remove post filters
@lgbo-ustc lgbo-ustc added the enhancement New feature or request label May 9, 2024
@lgbo-ustc
Copy link
Contributor Author

lgbo-ustc commented May 9, 2024

A example

select
  t1.n_nationkey, t2.n_nationkey from tpch_pq.nation as t1 left  join tpch_pq.nation as t2
on t1.n_nationkey = t2.n_nationkey 
  and t1.n_regionkey >= t2.n_regionkey ;

t1.n_regionkey >= t2.n_regionkey contains columns from t1 and t2.

Spark plan

CHNativeColumnarToRow
+- ^(10) ProjectExecTransformer [n_nationkey#25L, n_nationkey#29L]
   +- ^(10) InputIteratorTransformer[n_nationkey#25L, n_regionkey#27L, n_nationkey#29L, n_regionkey#31L]
      +- ^(10) InputAdapter
         +- ^(10) RowToCHNativeColumnar
            +- *(1) SortMergeJoin [n_nationkey#25L], [n_nationkey#29L], LeftOuter, (n_regionkey#27L >= n_regionkey#31L)
               :- CHNativeColumnarToRow
               :  +- ^(7) SortExecTransformer [n_nationkey#25L ASC NULLS FIRST], false, 0
               :     +- ^(7) InputIteratorTransformer[n_nationkey#25L, n_regionkey#27L]
               :        +- ^(7) InputAdapter
               :           +- ^(7) ColumnarExchange hashpartitioning(n_nationkey#25L, 5), ENSURE_REQUIREMENTS, [plan_id=318], [id=#318], [OUTPUT] List(n_nationkey:LongType, n_regionkey:LongType)
               :              +- ^(6) NativeFileScan parquet tpch_pq.nation[n_nationkey#25L,n_regionkey#27L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tpch_pq_data/nat..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<n_nationkey:bigint,n_regionkey:bigint>
               +- CHNativeColumnarToRow
                  +- ^(9) SortExecTransformer [n_nationkey#29L ASC NULLS FIRST], false, 0
                     +- ^(9) InputIteratorTransformer[n_nationkey#29L, n_regionkey#31L]
                        +- ^(9) InputAdapter
                           +- ^(9) ColumnarExchange hashpartitioning(n_nationkey#29L, 5), ENSURE_REQUIREMENTS, [plan_id=325], [id=#325], [OUTPUT] List(n_nationkey:LongType, n_regionkey:LongType)
                              +- ^(8) FilterExecTransformer (isnotnull(n_nationkey#29L) AND isnotnull(n_regionkey#31L))
                                 +- ^(8) NativeFileScan parquet tpch_pq.nation[n_nationkey#29L,n_regionkey#31L] Batched: true, DataFilters: [isnotnull(n_nationkey#29L), isnotnull(n_regionkey#31L)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tpch_pq_data/nat..., PartitionFilters: [], PushedFilters: [IsNotNull(n_nationkey), IsNotNull(n_regionkey)], ReadSchema: struct<n_nationkey:bigint,n_regionkey:bigint>

@baibaichen
Copy link
Contributor

It may related with #1986

@baibaichen
Copy link
Contributor

baibaichen commented May 9, 2024

A example

select
  t1.n_nationkey, t2.n_nationkey from tpch_pq.nation as t1 left  join tpch_pq.nation as t2
on t1.n_nationkey = t2.n_nationkey 
  and t1.n_regionkey >= t2.n_regionkey ;

t1.n_regionkey >= t2.n_regionkey contains columns from t1 and t2.

@lgbo-ustc

Let CHJoinValidateUtil::shouldFallback always return false. and try tpch 21 at

TPCH21 has != in On clause

+- ^(169) RowToCHNativeColumnar
   +- *(1) BroadcastHashJoin [l_orderkey#9760L], [l_orderkey#9807L], LeftAnti, BuildRight, NOT (l_suppkey#9809L = l_suppkey#9762L), false
      :- *(1) BroadcastHashJoin [l_orderkey#9760L], [l_orderkey#9790L], LeftSemi, BuildRight, NOT (l_suppkey#9792L = l_suppkey#9762L), false

Note

It's LestSemit and LeftAnti

@baibaichen baibaichen changed the title [CH] Support converting sort merge join into hash join [CH] Support non-equal hash join May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants