Pivot and melt #644

aljazerzen · 2022-06-23T13:42:03Z

One thing SQL lacks is a idiom of converting:

date	status	transaction
2022-01-01	live	10
2022-01-01	test	20
2022-01-02	live	5
2022-01-02	test	30
2022-01-03	live	25

.. into ..

date	live	test
2022-01-01	10	20
2022-01-02	5	30
2022-01-03	25	null

.. and back.

There is many names for this, let's list them here:

pandas
- wide_to_long
- melt
- pivot
MS Excel pivot tables
dplyr/tidyverse
- pivot_longer
- pivot_wider
polars
- melt
- pivot

The problem has some unknowns which are treated differently by the functions above:

what to do with unused columns?
what to do with duplicated values?
how to name the new columns?

Related comment #300 (comment)
Could be implemented as: https://stackoverflow.com/questions/69263964/how-to-pivot-in-postgresql
Would solve: https://old.reddit.com/r/SQL/comments/viumd0/bigquery_how_to_aggregate_data/

Currently I don't have capacity to tackle all these, so I'm just opening a tracking issue.

max-sixty · 2022-06-24T06:19:54Z

Thanks for starting this!

I would split the feature into:

Static column names, known at compile time — we could write this in PRQL without too much trouble
Dynamic column names — this would require much more work where it's not supported by the DB; we'd have to build a query from the result of another query

BQ can do this dynamically now, but only as the last operation of a query: https://towardsdatascience.com/pivot-in-bigquery-4eefde28b3be

dbt can also do this dynamically: https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/pivot.sql, by issuing two queries.

I really think we could be very good at the first. We have to do the work to think through how to build it, but it can be done with some very simple rust or a templating language (not quite s-strings atm tho!).

Zooming out — while in this instance dbt can do both static & dynamic, its macros are often used for things known statically. And while I love dbt, this is a feature born of necessity rather than desire, and the sort of area we can start improving upon.

aljazerzen · 2022-06-25T19:15:54Z

Uff, the dbt pivot syntax seems very cumbersome. We can surely improve that.

I very much agree that "dynamic column names" are much harder.

Am I correct in assuming that pivot operation requires dynamic columns, while melt produces only a static columns (independent of table contents)?

This would mean that we can start with melt (wide to long).

max-sixty · 2022-06-26T04:48:25Z

Am I correct in assuming that pivot operation requires dynamic columns, while melt produces only a static columns (independent of table contents)?

I think:

With melt, the query needs the columns (because it has to know what rotate), but they are always defined statically — there is no extra information at runtime.
With pivot, queries can be static, if you know what data to expect / what columns to create. But without knowing what columns to create, it does need to be dynamic

Is that how you're thinking about it?

snth · 2023-03-12T08:04:44Z

Adding links to our recent discussion on Discord and the DuckDB developments.

Discord discussion: https://discord.com/channels/936728116712316989/945832228011798528/1077845402793017384

DuckDB pivot: duckdb/duckdb#6387

datamike · 2023-06-27T20:50:54Z

I hope this is applicable to the Pivot discussion. I am new to PRQL and I was thinking, "I wonder if I can get it to do something similar to Pivot() ..."

let header_rows = from_text """
col1,col2
U30,U50
"""

let under_thirty = (
  from employees
    filter age < 30
    aggregate [
        ct30 = count
    ]
)

let under_fifty = (
  from employees
    filter (age | in 30..49)
      aggregate [
        ct50 = count
      ]
)

let aggregate_onerow = (
  from under_thirty 
  join side:full under_fifty []

)

from aggregate_onerow`

This compiled to the following SQL (which is a "poor man's pivot").  I could not do anything with the 'header_rows' variable / CTE. 

`WITH under_thirty AS (
  SELECT
    COUNT(*) AS ct30
  FROM
    employees
  WHERE
    age < 30
),
under_fifty AS (
  SELECT
    COUNT(*) AS ct50
  FROM
    employees
  WHERE
    age BETWEEN 30 AND 49
),
aggregate_onerow AS (
  SELECT
    under_thirty.ct30,
    under_fifty.ct50
  FROM
    under_thirty FULL
    JOIN under_fifty ON true
)
SELECT
  ct30,
  ct50
FROM
  aggregate_onerow

-- Generated by PRQL compiler version:0.8.1

era127 · 2023-10-21T12:27:43Z

Currently, there is native functionality in duckdb sql for Pivot as well.

max-sixty · 2024-03-17T17:56:03Z

This was discussed at https://news.ycombinator.com/item?id=39717268. We should ensure that this is at least possible with an s-string.

aljazerzen added the language-design Changes to PRQL-the-language label Jun 23, 2022

aljazerzen mentioned this issue Jan 2, 2023

s-string impact on semantic checks #573

Closed

aljazerzen added the major-feature label Jan 8, 2023

max-sixty added the priority label Mar 17, 2024

max-sixty pinned this issue Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot and melt #644

Pivot and melt #644

aljazerzen commented Jun 23, 2022

max-sixty commented Jun 24, 2022

aljazerzen commented Jun 25, 2022

max-sixty commented Jun 26, 2022

snth commented Mar 12, 2023

datamike commented Jun 27, 2023 •

edited by max-sixty

era127 commented Oct 21, 2023

max-sixty commented Mar 17, 2024

Pivot and melt #644

Pivot and melt #644

Comments

aljazerzen commented Jun 23, 2022

max-sixty commented Jun 24, 2022

aljazerzen commented Jun 25, 2022

max-sixty commented Jun 26, 2022

snth commented Mar 12, 2023

datamike commented Jun 27, 2023 • edited by max-sixty

era127 commented Oct 21, 2023

max-sixty commented Mar 17, 2024

datamike commented Jun 27, 2023 •

edited by max-sixty