New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aliasing Left Side of Join Breaks Derivation #4299
Comments
Thanks for the issue. Possibly this should raise a proper error message. The fix is that the aliasing is within the prql target:sql.bigquery
from sales_with_profit=sales
derive {profit = sell_price - buy_price}
join side:left details (this.item_id == that.item_id)
select !{details.item_id} WITH table_0 AS (
SELECT
*,
sell_price - buy_price AS profit
FROM
sales AS sales_with_profit
)
SELECT
table_0.*
EXCEPT
(profit),
details.*
EXCEPT
(item_id),
table_0.profit
FROM
table_0
LEFT JOIN details ON table_0.item_id = details.item_id
-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org) Does that make sense? Edit: but looks like there's a different bug — the |
About Raising ErrorIt sounds like the PRQL parser is accepting invalid PRQL, so I think raising an error would be ideal (to prevent silent failure). About SELECT * EXCEPTI'm actually not 100% familiar with the "SELECT * EXCEPT" syntax, so I didn't realize that wasn't valid. In any, case, the PRQL compiler is not indenting it sensibly, I believe PRQL compiler is trying to output this (exact same content, just re-indented for clarity): WITH table_0 AS (
SELECT
*,
sell_price - buy_price AS profit
FROM
sales AS sales_with_profit
)
SELECT
table_0.* EXCEPT (profit),
details.* EXCEPT (item_id),
table_0.profit
FROM
table_0
LEFT JOIN details ON table_0.item_id = details.item_id Are you trying to say it should be something like: WITH table_0 AS (
SELECT
*,
sell_price - buy_price AS profit
FROM
sales AS sales_with_profit
)
SELECT * EXCEPT (table_0.profit, details.item_id),
FROM
table_0
LEFT JOIN details ON table_0.item_id = details.item_id ? About Documentation around Aliasing / Simulating EquijoinsBigQuery is not actually one of my ultimate targets, but rather Spark SQL (which is not yet supported). The "sql.generic" target doesn't support "SELECT * EXCEPT" syntax at all, so I am using sql.bigquery as an arbitrary intermediate target and then using sqlglot to convert to Spark SQL. "sql.generic" silently omits "select !{...}" statements without giving a warning it is doing so. But that's not really relevant to this issue. What is relevant that you don't have support for equijoins (anymore?), so I need to drop the duplicated join columns (either from the left or the right table, depending on if it is left or right join). For dropping from the right table, you can (AFAIK) replace "details" in my example with an arbitrary PRQL pipeline then wrap it in "details = (...)" and it will continue to work, e.g.:
This makes it especially suitable for recursive / compositional code-gen (my use case). For dropping the column from the left side, however, there doesn't seem to be documented an analogous aliasing feature:
What do you sub in for the Just messing around, it appears that If it can be relied upon, it would help a lot to have some examples added to the documentation specifically demonstrating how to do right/left equijoins (removing the redundant columns in those cases). There are other more verbose ways of handling this situation (such as aliasing the join columns in one of the tables before the join, then dropping those aliased columns after), but this adds a fair amount of complexity for the reader. EDIT: It appears that "let ..." or "into" statements might solve my particular code-gen use case (since a |
Sorry, I'm confusing things. The Though are you sure SparkSQL supports The equijoins point is valid; that is a big advantage of using FWIW we're rethinking how name resolution works, and I would love to put At the moment, I think the main way out is to either use Does that answer your question re the except? I'm not sure how much of that comes from my incorrect claim about Feel free to post more details of what you're doing, it sounds quite interesting! |
What I am working on is probably pretty standard: just trying to write some tools to help build unit-testable data pipelines that can run in 3 different languages/dialects: (Py)Spark SQL, Presto/Trino, and Pandas. We have tables with lots of columns (sometimes 100+), so some degree of meta-programming is required just to not lose your sanity. It would also be nice to automate column lineage (which I know is in progress for PRQL). I'm thinking of PRQL as a kind of intermediate language: you have dev tools (which might make extensive use of meta-programming) written in python which generate PRQL, and then PRQL which can be compiled to run in these different environments. PRQL is great for this purpose. Right now, I have an interface which mimics the PySpark API, but generates PRQL instead. PySpark and PRQL are surprisingly similar (e.g. there is a ".filter" method, a ".groupBy.agg()" method, etc.), and they are both pipelined. So right now I could actually go directly to PySpark without much trouble and support all the features needed, but it would be nicer to have everything go through PRQL. Spark SQL does in fact support a syntax to drop columns. As long as you have run SELECT
table_0.`(item_id)?+.+`,
table_1.*
FROM `table_1`
RIGHT JOIN `table_0`
ON `table_1`.`item_id` = `table_0`.`item_id`; This is using a Java-specific regex syntax for the column selector which is a little bit hard to understand, but basically says "Either you start with Whether some combination of PRQL + SQLGlot can actually automatically convert SELECT * EXCEPT to this syntax is yet to be seen (although I don't see any conceptual blockers). PySpark has a "drop(...)" method so PySpark is easier to target. |
Nice, thanks, v interesting to hear what you're working on. |
What happened?
Aliasing the left side of a join magically erases any derivations in that left side.
Without alias (seems to work):
With alias (doesn't work):
With the aliasing, the "profit" derivation completely disappears. That should not happen.
PRQL input
SQL output
Expected SQL output
MVCE confirmation
Anything else?
No response
The text was updated successfully, but these errors were encountered: