-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql): push down ORDER BY advice for DISTINCT queries #4180
base: master
Are you sure you want to change the base?
fix(sql): push down ORDER BY advice for DISTINCT queries #4180
Conversation
assertMemoryLeak(() -> { | ||
ddl("create table tab (x int, y long, ts timestamp) timestamp(ts)"); | ||
|
||
assertPlan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be great to assert result sets, not only execution plans in these tests.
ddl("create table tab (x int, y long, ts timestamp) timestamp(ts)"); | ||
|
||
assertPlan( | ||
"select DISTINCT ts, x, y from tab order by ts DESC", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add another test for the select DISTINCT x, y from tab order by ts DESC
case. The test should verify that we don't push down ORDER BY advice since the timestamp column is not in the SELECT DISTINCT clause.
Other candidates would be a query with positional numbers in ORDER BY (e.g. select DISTINCT ts, x, y from tab order by 1
) and a SELECT DISTINCT over a sub-query (e.g. select DISTINCT ts, x, y from (tab WHERE x > y) order by ts
or select DISTINCT ts, x, y from (SELECT tab.ts, tab.x, tab2.y FROM tab ASOF JOIN tab2) order by ts
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select DISTINCT x, y from tab order by ts DESC
In fact, this query is invalid; therefore, I initially skipped it (there is a test in OrderByAdviceTest that verifies a similar case). Now, I've added a test to verify that the exact query you suggested is invalid.
select DISTINCT ts, x, y from (tab WHERE x > y) order by ts DESC
This one on the other hand revealed unexpected behaviour:
Sort light
keys: [ts desc]
DistinctTimeSeries
keys: ts,x,y
Async JIT Filter workers: 1
filter: y<x
DataFrame
Row forward scan
Frame forward scan on: tab
I'd expect a backward scan.
The reason why it happens is that here:
this.aliasToColumnNameMap.put(name, name); |
columnNameToAliasMap
field is not updated. Based on the rest of the code, I'd expect that the following fields are always kept in sync:
bottomUpColumnNames
bottomUpColumns
aliasToColumnNameMap
aliasToColumnMap
columnNameToAliasMap
columnAliasIndexes
I'll investigate this in the evening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this query is invalid
Good point! That query was indeed invalid.
I'd expect a backward scan.
Yes, it should be a backward scan. As for the copyColumnsFrom
method, indeed it could be the case that it doesn't leave the maps in sync. Look forward to your further findings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @piotrrzysko
Any chances that you'll be able to finish this PR any time soon? If not, not a problem at all - I'm just checking to understand if someone from QuestDB team should pick it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andrei,
Based on my research, in order to finish this PR (specifically this thread), we need to solve this issue: #4197.
Additionally, Vlad mentioned to me that a fuzzy test related to HyperLogLog occasionally fails (I suspect it's related to this comment: #4083 (comment)). My plan was, when I have time, to first address the HLL problem and then return to this PR.
Generally, I'm happy to help, but I'm a bit busy right now, and I won't be able to work on any of these issues in the next two or three weeks. Please, let me know what you think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. No prob with the time aspect: the PR will be kept open as long as needed. Needless to say, we value your contributions a lot.
As for the flaky test, relaxing the check should help.
Fixes #4172.
Pushing down ORDER BY advice allows selecting a scanning order that matches the order direction specified in the ORDER BY clause. Consequently, for the query from the issue, we can avoid unnecessary sorting and utilize a record cursor factory available here:
questdb/core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
Lines 2378 to 2384 in 83cb7c3