Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(presto): fix DELETE DML statement for presto/trino #3466

Merged
merged 3 commits into from May 15, 2024

Conversation

viplazylmht
Copy link
Contributor

Context

  • Presto/Trino only support DELETE statement with FROM keyword and only allow to delete data from one table per query.
  • Don't allow to have any table alias in the target table which will be deleted.

Documentation

Reproduce

Prerequisite

echo "connector.name=memory" > memory.properties
docker run -p 8080:8080 -it -v ./memory.properties:/opt/presto-server/etc/catalog/memory.properties --name presto prestodb/presto:latest

docker exec -it presto presto-cli

presto> create table memory.default.tbl(x int);
CREATE TABLE
presto> show create table memory.default.tbl;
           Create Table            
-----------------------------------
 CREATE TABLE memory.default.tbl ( 
    "x" integer                    
 )                                 
(1 row)

Test DELETE DML statement

# normal case
presto> explain(type validate) delete from memory.default.tbl where x = 1;
 result 
--------
 true   
(1 row)

# failed cases
presto> explain(type validate) delete memory.default.tbl where x = 1;
Query 20240513_165153_00014_37tzg failed: line 1:31: mismatched input 'memory'. Expecting: 'FROM'

presto> explain(type validate) delete from memory.default.tbl t where t.x = 1;
Query 20240513_163843_00010_37tzg failed: line 1:55: mismatched input 't'. Expecting: '.', 'WHERE', <EOF>

presto> explain(type validate) delete from memory.default.tbl as t where t.x = 1;
Query 20240513_165118_00013_37tzg failed: line 1:55: mismatched input 'as'. Expecting: '.', 'WHERE', <EOF>

...

So, when transpiling, we have to handle the delete_sql to make Presto/Trino happy.

Solution

  • Presto only support DELETE FROM a table, without alias, so we need to remove the unnecessary parts.
  • If the original DELETE statement contains more than one table to be deleted, just pick the first one.
DELETE ...
->
DELETE FROM ...
DELETE FROM tbl as t where t.col_x = 1;
->
DELETE FROM table where col_x =1;
DELETE tbl1 as z, tbl2 as t where z.id = 1;
->
DELETE FROM tble1 where id = 1
DELETE tbl1 as z, tbl2 as t where z.id = t.id;
->
DELETE FROM tbl1 where id = t.id  <<< FAILED CASE

Test

  • Add tests for bigquery dialect. Please note that bigquery can delete only a table at a time, look like presto.
  • Maybe we need more test for mysql (mysql allow delete more than one table each call).

Discussion

Should we raise an error if there is more than one table to be deleted when transpiling to presto/trino dialect?

sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
Comment on lines 648 to 650
for column in expression.find_all(exp.Column):
if column.table == table_alias:
del column.args["table"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if getting rid of the table like this is safe / correct. Eg. what happens if there's a db or catalog? We would probably need to do expression.transform(transforms.unqualify_columns) on the filter child.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know what happens if there's a db or catalog. Can you explain it, please?

I think each column belongs to a table, so a column only have a table_alias (or table name) stored into column.table property. I will correct the table name case.

Edit: Maybe transforms.unqualify_columns work like a charm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know what happens if there's a db or catalog. Can you explain it, please?

Previously we were only popping off the column's table, and I was wondering if this could lead to an inconsistent state where a column that original had a fully-qualified name (catalog, db, table, this) would only get its table removed.

This probably can't happen for the columns of the table we're deleting from, since you'd qualify the columns of that table using only the alias, so there's no catalog/db. But it could happen for columns of other tables, eg. referenced in a subquery inside of the WHERE clause:

DELETE FROM db.t1 AS t1 WHERE NOT t1.c IN (SELECT db.t2.c FROM db.t2 LEFT JOIN db.t3 USING (id));

Btw, the current approach will also unqualify db.t2.c here, which could make the subquery ambiguous, e.g. if c existed in both t2 and t3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I got it.

Btw, the current approach will laso unqualify db.t2.c here, which could make the subquery ambiguous, e.g. if c existed in both t2 and t3.

Is there any function to fully qualify the catalog/db/table to the column? After that we can just remove the alias and thing should work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only used to aid the transpilation, which is best-effort, I think I'd just keep what you have but move the unqualify_columns transformation inside the branch that checks if there is an alias. We don't want to always unqualify columns, right? We only need it if the target table is aliased.

That way, we won't deal with the case I described above when transpiling to presto/trino, but that's ok I guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tobymao can you also take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgesittas Should we always remove table alias, and always keep the fully qualified column?

Prototype code seems to my first version:

table_alias = table.alias
if table_alias:
    del table.args["alias"]

    for column in expression.find_all(exp.Column):
        if column.table == table_alias and not column.catalog and not column.db:
            del column.args["table"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get this in and do a quick refactor, will share with you in a bit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
sqlglot/dialects/presto.py Show resolved Hide resolved
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
@georgesittas georgesittas merged commit a3ff49e into tobymao:main May 15, 2024
6 checks passed
@georgesittas
Copy link
Collaborator

Thanks a lot for these PRs @viplazylmht 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants