Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Don't allow passing missing data to generalized ufuncs #16198

Merged
merged 5 commits into from
May 16, 2024

Conversation

itamarst
Copy link
Contributor

Fixes #14811

This is a more constrained PR, just addressing one specific issue: passing missing data to generalized ufuncs can silently lead to incorrect results.

Generalized ufuncs are a NumPy feature (https://numpy.org/neps/nep-0020-gufunc-signature-enhancement.html), but the easiest way to implement them is using Numba, which is what the tests do.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels May 13, 2024
Copy link

codecov bot commented May 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.91%. Comparing base (9bfa30c) to head (1f176ce).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16198      +/-   ##
==========================================
- Coverage   80.99%   80.91%   -0.08%     
==========================================
  Files        1392     1394       +2     
  Lines      178930   179574     +644     
  Branches     2904     2913       +9     
==========================================
+ Hits       144925   145310     +385     
- Misses      33502    33759     +257     
- Partials      503      505       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@itamarst itamarst marked this pull request as ready for review May 13, 2024 17:18
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me, just got some quite minor comments

py-polars/tests/unit/interop/numpy/_numba.py Outdated Show resolved Hide resolved
py-polars/tests/unit/interop/numpy/test_ufunc_series.py Outdated Show resolved Hide resolved
py-polars/polars/series/series.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @itamarst !

@itamarst
Copy link
Contributor Author

Thank you @MarcoGorelli. If you have commit access you'll have to merge it too, since I can't.

@MarcoGorelli
Copy link
Collaborator

yup - just leaving it open another day in case anyone has comments, then I'll ship it

outside of generalized ufuncs, there's no other case where missing values are a concern, right?

@itamarst
Copy link
Contributor Author

For normal value-by-value ufuncs, if you have memory that is e.g. an invalid float64 then that could potentially be a problem? You're potentially feeding random garbage to the function for missing data.

@ritchie46 ritchie46 merged commit 007524c into pola-rs:main May 16, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generalized ufuncs that return arrays are broken in a variety of cases
4 participants