Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust,python): Harden Series.reshape against invalid parameters #16281

Merged
merged 2 commits into from
May 18, 2024

Conversation

datenzauberai
Copy link
Contributor

@datenzauberai datenzauberai commented May 16, 2024

This is a fix for #15543

Improvements:

  • dimensions < -1 will raise an error instead of panicking
  • dimensions will be inferred from placeholders -1 whenever it is possible; (-1, -1) will be inferred to either (0, 0) for an empty series or to (len(), 1) for a non-empty series
  • empty series can only be reshaped to (0, 0) or (0, ) (after inferring)
  • dimensions that do not fulfil rows*cols == len() after inferring placeholders will raise an error

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels May 16, 2024
@datenzauberai datenzauberai marked this pull request as ready for review May 16, 2024 22:28
Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.81%. Comparing base (11fe9d8) to head (a6e343f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16281      +/-   ##
==========================================
+ Coverage   80.80%   80.81%   +0.01%     
==========================================
  Files        1393     1393              
  Lines      179406   179406              
  Branches     2921     2921              
==========================================
+ Hits       144971   144989      +18     
+ Misses      33932    33914      -18     
  Partials      503      503              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@datenzauberai
Copy link
Contributor Author

datenzauberai commented May 17, 2024

While pl.List gets first exploded and is reshaped according to the spec...

print(pl.DataFrame(pl.Series(name="a", values=[[1, 2, 3, 4]], dtype=pl.List(pl.Int64))).select(pl.col("a").reshape((2,2))))
print(pl.DataFrame(pl.Series(name="a", values=[[1], [2], [3], [4]], dtype=pl.List(pl.Int64))).select(pl.col("a").reshape((2,2))))
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2]    │
│ [3, 4]    │
└───────────┘

pl.Array is treated differently right now more like a scalar datatype...

print(pl.DataFrame(pl.Series(name="a", values=[[1, 2, 3, 4]], dtype=pl.Array(pl.Int64, width=4))).select(pl.col("a").reshape((2, 2))))
ComputeError: cannot reshape len 1 into shape [2, 2]
print(pl.DataFrame(pl.Series(name="a", values=[[1], [2], [3], [4]], dtype=pl.Array(pl.Int64, width=1))).select(pl.col("a").reshape((2, 2))))
┌─────────────────────┐
│ a                   │
│ ---                 │
│ list[array[i64, 1]] │
╞═════════════════════╡
│ [[1], [2]]          │
│ [[3], [4]]          │
└─────────────────────┘

We could explode pl.Array(inner_dtype) as well and then return pl.List(inner_dtype), but one could also argue that not exploding is how it should work and makes sense if one uses pl.Array for struct-like things.

pl.Array would even be a much more natural output type than pl.List for this operation, but it would not be able to handle empty series (no pl.Array(width=0)) and I guess that's too much of a change...

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @datenzauberai

@ritchie46 ritchie46 merged commit fb65baa into pola-rs:main May 18, 2024
32 checks passed
@datenzauberai datenzauberai deleted the robust_reshape branch May 18, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants