FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use. #4182

kongxianghe1234 · 2021-07-24T04:01:37Z

spark.sql.adaptive.enabled=true
spark.sql.adaptive.skewedJoin.enabled=true
spark.sql.adaptive.skewedPartitionMaxSplits=5
spark.sql.adaptive.skewedPartitionRowCountThreshold=10000000
spark.sql.adaptive.skewedPartitionSizeThreshold=67108864
spark.sql.adaptive.skewedPartitionFactor : 5

--- In Spark2x JDBC no use for it.

t1 left join t2 on t1.id = t2.id column id has one key, for example 0000-00-00 ,has 100,000 records t2 has same key in column id also has 100,000 records ,this will generate 100000*100000 = 10B records!! for only one reducer.

carbon solution no use for it,please check it. -- call hw.

kongxianghe1234 · 2021-07-24T04:23:21Z

also add "spark.shuffle.statistics.verbose=true",still no use for skewed join

study-day · 2021-07-26T01:30:53Z

hi ,kongxianghe, We have also found a similar problem. If two tables are join, it will be very time-consuming if there is no de-duplication. And spark only uses a few executors..

didiaode18 · 2021-07-29T10:28:51Z

+1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use. #4182

FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use. #4182

kongxianghe1234 commented Jul 24, 2021

kongxianghe1234 commented Jul 24, 2021

study-day commented Jul 26, 2021

didiaode18 commented Jul 29, 2021

FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use. #4182

FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use. #4182

Comments

kongxianghe1234 commented Jul 24, 2021

kongxianghe1234 commented Jul 24, 2021

study-day commented Jul 26, 2021

didiaode18 commented Jul 29, 2021