Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer & runner should consider NULL special rules #3925

Open
aceforeverd opened this issue May 11, 2024 · 2 comments
Open

optimizer & runner should consider NULL special rules #3925

aceforeverd opened this issue May 11, 2024 · 2 comments
Assignees
Labels
bug Something isn't working sql SQL standard. definition/revise for a SQL statement/clause etc

Comments

@aceforeverd
Copy link
Collaborator

For a index optimizer, the most common pattern is, for example, SQL SELECT * FROM t1 WHERE gp = {const_val}, is optimized based on FILTER expression gp = {const_val}:

So if index key=gp exists for table t1, that index is utilized, as ** SEEK BY KEY {const_val} >> OUTPUT ALL ROWS`**.

However, if {const_val} is NULL, this optimize rule should not go that approach, since <ANY_VAL> = NULL is NULL.

This also happens for SQL like LAST JOIN t1 LAST JOIN t2 on t1.key = t2.key, where there is row exists in t1 that t1.key = NULL. It never joins.

Optimizer should consider const NULL values, and Runner should consider NULL values as index_key

@aceforeverd aceforeverd added bug Something isn't working sql SQL standard. definition/revise for a SQL statement/clause etc labels May 11, 2024
@aceforeverd aceforeverd self-assigned this May 11, 2024
@aceforeverd
Copy link
Collaborator Author

relate issue #3685

@aceforeverd
Copy link
Collaborator Author

some of the yaml testing cases are wrong:

  • - id: 10
    desc: t1 join t2 join t3,t2的key产出为null
    mode: offline-unsupport
    tags: ["@chendihao", "这个场景离线的预期不正确,需要迪豪看看"]
    inputs:
    -
    columns : ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1590738990000,"2020-05-01"]
    - [2,"bb",21,31,1.2,2.2,1590738991000,"2020-05-02"]
    - [3,"cc",22,32,1.3,2.3,1590738992000,"2020-05-03"]
    - [4,"dd",23,33,1.4,2.4,1590738993000,"2020-05-04"]
    - [5,"ee",24,34,1.5,2.5,1590738994000,"2020-05-05"]
    -
    columns: ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7","index2:c3:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1606755660000,"2020-05-01"]
    - [2,"aa",21,31,1.1,2.1,1606755720000,"2020-05-01"]
    - [3,"cc",21,32,1.1,2.1,1606755780000,"2020-05-01"]
    - [4,"dd",21,33,1.1,2.1,1606755840000,"2020-05-01"]
    - [5,"ee",24,34,1.2,2.2,1606755660000,"2020-05-02"]
    -
    columns : ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7","index2:c3:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1606755600000,"2020-05-01"]
    - [2,"bb",20,31,1.2,2.1,1606759200000,"2020-05-01"]
    - [3,"bb",null,32,1.3,2.1,1606762800000,"2020-05-01"]
    - [4,"bb",21,33,1.4,2.1,1606766400000,"2020-05-01"]
    - [5,"ee",21,34,1.5,2.2,1606766401000,"2020-05-02"]
    sql: |
    select {0}.id,{0}.c1,{0}.c3,{1}.c3,{2}.c4
    from {0}
    last join {1} ORDER BY {1}.c7 on {0}.c1={1}.c1
    last join {2} ORDER BY {2}.c7 on {1}.c3={2}.c3
    ;
    expect:
    order: id
    columns: ["id int","c1 string","c3 int","c3 int","c4 bigint"]
    rows:
    - [1,"aa",20,21,34]
    - [2,"bb",21,null,32]
    - [3,"cc",22,21,34]
    - [4,"dd",23,21,34]
    - [5,"ee",24,24,null]
    - id: 11
    desc: (t1 join t2) join t3
    mode: offline-unsupport
    inputs:
    -
    columns : ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1590738990000,"2020-05-01"]
    - [2,"bb",21,31,1.2,2.2,1590738991000,"2020-05-02"]
    - [3,"cc",22,32,1.3,2.3,1590738992000,"2020-05-03"]
    - [4,"dd",23,33,1.4,2.4,1590738993000,"2020-05-04"]
    - [5,"ee",24,34,1.5,2.5,1590738994000,"2020-05-05"]
    -
    columns: ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7","index2:c3:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1606755660000,"2020-05-01"]
    - [2,"aa",21,31,1.1,2.1,1606755720000,"2020-05-01"]
    - [3,"cc",21,32,1.1,2.1,1606755780000,"2020-05-01"]
    - [4,"dd",21,33,1.1,2.1,1606755840000,"2020-05-01"]
    - [5,"ee",24,34,1.2,2.2,1606755660000,"2020-05-02"]
    -
    columns : ["id int","c1 string","c3 int","c4 bigint","c5 float","c6 double","c7 timestamp","c8 date"]
    indexs: ["index1:c1:c7","index2:c3:c7"]
    rows:
    - [1,"aa",20,30,1.1,2.1,1606755600000,"2020-05-01"]
    - [2,"bb",20,31,1.2,2.1,1606759200000,"2020-05-01"]
    - [3,"bb",null,32,1.3,2.1,1606762800000,"2020-05-01"]
    - [4,"bb",21,33,1.4,2.1,1606766400000,"2020-05-01"]
    - [5,"ee",21,34,1.5,2.2,1606766401000,"2020-05-02"]
    sql: |
    select
    t1.id,t1.c1,t1.c3,{2}.c4
    from (
    select {0}.id,{0}.c1,{1}.c3
    from {0}
    last join {1} ORDER BY {1}.c7 on {0}.c1={1}.c1
    ) as t1 last join {2} ORDER BY {2}.c7 on t1.c3={2}.c3
    ;
    expect:
    order: id
    columns: ["id int","c1 string","c3 int","c4 bigint"]
    rows:
    - [1,"aa",21,34]
    - [2,"bb",null,32]
    - [3,"cc",21,34]
    - [4,"dd",21,34]
    - [5,"ee",24,null]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sql SQL standard. definition/revise for a SQL statement/clause etc
Projects
None yet
Development

No branches or pull requests

1 participant