Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.to_numpy doesn't work for Array types with nulls #14268

Closed
2 tasks done
stinodego opened this issue Feb 5, 2024 · 5 comments · Fixed by #16230
Closed
2 tasks done

Series.to_numpy doesn't work for Array types with nulls #14268

stinodego opened this issue Feb 5, 2024 · 5 comments · Fixed by #16230
Assignees
Labels
A-interop-numpy Area: interoperability with NumPy accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@stinodego
Copy link
Member

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

values = [[1, 2], [3, 4], None]
dtype = pl.Array(pl.Int8, 2)
s = pl.Series(values=values, dtype=dtype)

result = s.to_numpy()

Log output

Traceback (most recent call last):
  File "/home/stijn/code/polars/py-polars/repro.py", line 14, in <module>
    print(s.to_numpy())
          ^^^^^^^^^^^^
  File "/home/stijn/code/polars/py-polars/polars/series/series.py", line 4365, in to_numpy
    np_array.shape = (self.len(), self.dtype.width)  # type: ignore[attr-defined]
    ^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 5 into shape (3,2)

Issue description

The problem is that after the explode, the size of the array is not len * width, because the null value is only present once (instead of width times).

Expected behavior

I would expect an ndarray of f32 type with values [[1.0, 2.0], [3.0, 4.0], [nan, nan]].

Installed versions

main

@stinodego stinodego added bug Something isn't working python Related to Python Polars P-low Priority: low labels Feb 5, 2024
@reswqa
Copy link
Collaborator

reswqa commented Feb 5, 2024

Emmm, explode single null into width null values is an unexpected behavior for the user, I think. Things get even weirder when we explode a dataframe that contains array column. We fixed this behavior a while ago, but it broke here 😅

I understand, though, that for Series.to_numpy we have to do this, otherwise it doesn't conform to the ndarray spec. But I kind of don't want to add a parameter to Expr::Explode that only makes sense for the Array type. I'm assuming that this behavior is only expected for to_numpy, which is essentially getting the value of ArrayChunked. How about exporting a method called array_to_values just for it? WDYT?

@stinodego stinodego added the A-interop-numpy Area: interoperability with NumPy label Feb 5, 2024
@stinodego
Copy link
Member Author

I think the current behavior of explode for arrays is OK. We should address this in Series.to_numpy.

Not sure what the right approach is. If we can access the Array values + validity mask directly in Rust, that would be useful so we can address this issue at least. I wouldn't want to expose any new methods in the Python API for this though.

@reswqa
Copy link
Collaborator

reswqa commented Feb 6, 2024

I wouldn't want to expose any new methods in the Python API for this though

Would you be okay with us exposing a method under py-polars/src/functions, in the form of a #[pyfunction]. 🤔

@stinodego
Copy link
Member Author

I think @ritchie46 already had some idea on how to address this, so I'll defer to him on this one.

@reswqa
Copy link
Collaborator

reswqa commented Feb 6, 2024

Oh, that's great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interop-numpy Area: interoperability with NumPy accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants