Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.concat_list() can cause explode() to throw a shape error #16215

Open
2 tasks done
jaschn opened this issue May 14, 2024 · 0 comments
Open
2 tasks done

pl.concat_list() can cause explode() to throw a shape error #16215

jaschn opened this issue May 14, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jaschn
Copy link

jaschn commented May 14, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame(
	{
		"cl1": [[0], [0]],
		"cl2": [[0], [0]],
	}
)

df_row_1 = df[1] # index 0 works and any others fail
df_row_1 = df_row_1.select(
		pl.col("cl1"),
		pl.concat_list(pl.col("cl2")) # without pl.concat it works as well
	)

df_row_1.explode(pl.all())

Log output

---------------------------------------------------------------------------
ShapeError                                Traceback (most recent call last)
Cell In[6], line 16
     10 df_row_1 = df[1]
     11 df_row_1 = df_row_1.select(
     12 		pl.col("cl1"),
     13 		pl.concat_list(pl.col("cl2"))
     14 	)
---> 16 df_row_1.explode(pl.all())

File ~/miniconda3/envs/x/lib/python3.12/site-packages/polars/dataframe/frame.py:7193, in DataFrame.explode(self, columns, *more_columns)
   7136 def explode(
   7137     self,
   7138     columns: str | Expr | Sequence[str | Expr],
   7139     *more_columns: str | Expr,
   7140 ) -> DataFrame:
   7141     """
   7142     Explode the dataframe to long format by exploding the given columns.
   7143 
   (...)
   7191     └─────────┴─────────┘
   7192     """
-> 7193     return self.lazy().explode(columns, *more_columns).collect(_eager=True)

File ~/miniconda3/envs/x/lib/python3.12/site-packages/polars/lazyframe/frame.py:1816, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager, **_kwargs)
   1813 # Only for testing purposes atm.
   1814 callback = _kwargs.get("post_opt_callback")
-> 1816 return wrap_df(ldf.collect(callback))

ShapeError: exploded columns must have matching element counts

Issue description

When indexing the non first row, having a list column and concatenating a list column the following explode() can fail with a shape error

Expected behavior

import polars as pl

df = pl.DataFrame(
	{
		"cl1": [[0], [0]],
		"cl2": [[0], [0]],
	}
)

df_row_1 = df[1]
df_row_1 = df_row_1.select(
		pl.col("cl1"),
		pl.col("cl2")
	)

print(df_row_1.explode(pl.all()))

df_row_0 = df[0]
df_row_0 = df_row_0.select(
		pl.col("cl1"),
		pl.concat_list(pl.col("cl2"))
	)

print(df_row_0.explode(pl.all()))

shape: (1, 2)

cl1 cl2
i64 i64
0 0

shape: (1, 2)

cl1 cl2
i64 i64
0 0

Installed versions

--------Version info---------
Polars:               0.20.25
Index type:           UInt32
Platform:             Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.3.1
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                2.2.2+cu121
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@jaschn jaschn added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant