-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix stale histogram lead to unexpected stats #45614
Conversation
Signed-off-by: packy92 <wangchao@starrocks.com>
return Optional.empty(); | ||
} | ||
|
||
return Optional.of(new Histogram(bucketList, estimatedMCV)); | ||
} | ||
|
||
public static ColumnStatistic estimateColumnStatisticsWithHistogram(ColumnStatistic columnStatistic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
There is a potential NullPointerException
or unhandled empty Optional
being returned from updateHistWithLessThan
and updateHistWithGreaterThan
, which are used without checking whether the Optional
contains a value before calling .get()
method. This might lead to NoSuchElementException
if the Optional is empty.
You can modify the code like this:
Inside both estimateColumnLessThanConstant
and estimateColumnGreaterThanConstant
methods, replace the lines where Histogram estimatedHistogram = hist.get();
is used with the following:
if (!hist.isPresent()) {
// Handle the case where histogram is not present, for example:
return statistics; // Returning unchanged statistics as we cannot estimate
}
Histogram estimatedHistogram = hist.get();
This ensures that you handle the case where hist
might be empty properly, rather than directly calling .get()
on an Optional
that could potentially be empty, avoiding NoSuchElementException
.
Quality Gate passedIssues Measures |
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 31 / 34 (91.18%) file detail
|
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
@Mergifyio backport branch-3.1 |
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
Signed-off-by: packy92 <wangchao@starrocks.com> (cherry picked from commit 1e5626e)
Signed-off-by: packy92 <wangchao@starrocks.com> (cherry picked from commit 1e5626e) # Conflicts: # fe/fe-core/src/test/resources/sql/tpch-histogram-cost/q19.sql
Signed-off-by: packy92 <wangchao@starrocks.com> (cherry picked from commit 1e5626e) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/Histogram.java # fe/fe-core/src/test/resources/sql/tpch-histogram-cost/q19.sql
Signed-off-by: packy92 <wangchao@starrocks.com> (cherry picked from commit 1e5626e) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/BinaryPredicateStatisticCalculator.java # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/Histogram.java # fe/fe-core/src/test/resources/sql/tpch-histogram-cost/q19.sql
Signed-off-by: packy92 <wangchao@starrocks.com> (cherry picked from commit 1e5626e) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/BinaryPredicateStatisticCalculator.java # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/Histogram.java # fe/fe-core/src/test/resources/sql/tpch-histogram-cost/q19.sql
… (#45642) Co-authored-by: packy92 <110370499+packy92@users.noreply.github.com>
… (#45643) Signed-off-by: packy92 <wangchao@starrocks.com> Co-authored-by: packy92 <110370499+packy92@users.noreply.github.com> Co-authored-by: packy92 <wangchao@starrocks.com>
… (#45644) Signed-off-by: packy92 <wangchao@starrocks.com> Co-authored-by: packy92 <110370499+packy92@users.noreply.github.com> Co-authored-by: packy92 <wangchao@starrocks.com>
… (#45645) Signed-off-by: packy92 <wangchao@starrocks.com> Co-authored-by: packy92 <110370499+packy92@users.noreply.github.com> Co-authored-by: packy92 <wangchao@starrocks.com>
… (#45646) Signed-off-by: packy92 <wangchao@starrocks.com> Co-authored-by: packy92 <110370499+packy92@users.noreply.github.com> Co-authored-by: packy92 <wangchao@starrocks.com>
Signed-off-by: packy92 <wangchao@starrocks.com>
Why I'm doing:
Column histogram may not updated after the first collection.
If use the stale histogram to estimate row number may have a risk of divide zero exception because the estimated histogram may return 0 row count(a empty bucket and empty mcv).
What I'm doing:
If the predicate range doesn't overlap with the histogram, we use the min/max value instead of histogram to estimate statistics.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: