Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Distinguish exact vs. equivalent dtype for C type aliases. #21995

Merged
merged 11 commits into from Sep 7, 2022

Conversation

eirrgang
Copy link
Contributor

For asarray and for the dtype equality operator,
equivalent dtype aliases were considered exact matches.
This change ensures that the returned array has a descriptor
that exactly matches the requested dtype.

Note: Intended behavior of np.dtype('i') == np.dtype('l')
is to test compatibility, not identity. This change does not
affect the behavior of PyArray_EquivTypes(), and the
__eq__ operator for dtype continues to map to
PyArray_EquivTypes().

Fixes #1468.

For `asarray` and for the `dtype` equality operator,
equivalent dtype aliases were considered exact matches.
This change ensures that the returned array has a descriptor
that exactly matches the requested dtype.

Note: Intended behavior of `np.dtype('i') == np.dtype('l')`
is to test compatibility, not identity. This change does not
affect the behavior of `PyArray_EquivTypes()`, and the
`__eq__` operator for `dtype` continues to map to
`PyArray_EquivTypes()`.

Fixes numpy#1468.
Use the correct API call to actually set the base object.
Fix a false mismatch. Separate dtype objects, even if equivalent,
cause distinct array views to be created.
@eirrgang
Copy link
Contributor Author

The updates that were necessary for the tests indicate that this change will surprise some users. Some sort of release note is warranted, and perhaps some sort of update to docs related to the dtype kwarg.

Suggestions for specific docs files to update?

* Improve comments/docs.
* Improve descriptiveness of variable names.
* Add additional test expressions that would not pass without
  this patch.
Shorten some lines.
@eirrgang
Copy link
Contributor Author

The updates that were necessary for the tests indicate that this change will surprise some users. Some sort of release note is warranted, and perhaps some sort of update to docs related to the dtype kwarg.

Suggestions for specific docs files to update?

I couldn't find any docs that were specific enough to require updating, but I added a release notes doc.

@seberg
Copy link
Member

seberg commented Jul 17, 2022

Thanks this is awesome. Those are long docs, if we know a place to put them, we probably should... For the release note, it would be nice to make it much shorter unfortuntely!

EDIT: I (or someone else from us) can take a stab at that though!

@eirrgang
Copy link
Contributor Author

Thanks this is awesome. Those are long docs, if we know a place to put them, we probably should... For the release note, it would be nice to make it much shorter unfortuntely!

EDIT: I (or someone else from us) can take a stab at that though!

Is my proposed edit better? Or is there an alternative proposal?

@mattip mattip merged commit 65c10c1 into numpy:main Sep 7, 2022
@mattip
Copy link
Member

mattip commented Sep 7, 2022

Thanks @eirrgang

Copy link
Contributor Author

@eirrgang eirrgang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I was away for the weekend and just got caught up. Looks good, but I think there is a typo.

@mattip mattip mentioned this pull request Sep 8, 2022
@mattip
Copy link
Member

mattip commented Sep 8, 2022

See #22227

@eirrgang eirrgang deleted the mei-1468 branch September 8, 2022 14:06
@InessaPawson
Copy link
Member

Hi-five on merging your first pull request to NumPy, @eirrgang! We hope you stick around! Your choices aren’t limited to programming – you can review pull requests, help us stay on top of new and old issues, develop educational material, work on our website, add or improve graphic design, create marketing materials, translate website content, write grant proposals, and help with other fundraising initiatives. For more info, check out: https://numpy.org/contribute
Also, consider joining our mailing list. This is a great way to connect with other cool people in our community and be part of important conversations that affect the development of NumPy: https://mail.python.org/mailman/listinfo/numpy-discussion

@rgommers
Copy link
Member

rgommers commented Sep 9, 2022

As reported in gh-22233, there is a serious issue with this PR, likely a memory leak that is causing issues for downstream projects like SciPy and MNE-Python.

I will revert this PR. @eirrgang it would be great if you could resubmit this as a new PR, and then we can merge it again once the problem is understood and addressed. Maybe pytest-leaks or Valgrind will be helpful here in pointing to the root cause.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2022

Ugh, GitHub: Sorry, this pull request couldn’t be reverted automatically. It may have already been reverted, or the content may have changed since it was merged.

PyArray_FLAGS(oparr),
op,
op
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, either I just didn't think of it, or expected that this steals a reference to op for the base. We are missing a DECREF, I will make a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I originally tried PyArray_NewFromDescr with op for *data and nullptr for *obj, and PyArray_SetBaseObject or something that steals the reference for the base. Anyway, I guess we were sloppy when switching to PyArray_NewFromDescrAndBase. Thanks for the quick catch!

seberg added a commit to seberg/numpy that referenced this pull request Sep 9, 2022
The new path to preserve dtypes provided by creating a view got the
reference counting wrong, because it also hit the incref path that
was needed for returning the identity.

This fixes up numpygh-21995

Closes numpygh-22233
@leofang
Copy link
Contributor

leofang commented Sep 12, 2022

It is incorrect to add numpy/array_api/tests/test_asarray.py. The test itself is not of concern to Array API. It should be moved elsewhere.

(This is why we need to set up array_api namespace owners... cc: @seberg @rgommers)

@seberg
Copy link
Member

seberg commented Sep 12, 2022

Yes, sorry, I missed where this test was, it should be elsewhere. I had just thought it looked reasonably thorough. It might also be split up, but I don't care.

In any case, we need to move this test into /core/tests/test_<something> not here. @eirrgang do you want to have a look into that.

@eirrgang
Copy link
Contributor Author

Yes, sorry, I missed where this test was, it should be elsewhere. I had just thought it looked reasonably thorough. It might also be split up, but I don't care.

In any case, we need to move this test into /core/tests/test_<something> not here. @eirrgang do you want to have a look into that.

Sure. I'll make a new PR. As I recall, there wasn't an obvious good place for the test. I'll look again, and try to come up with something reasonable.

@seberg
Copy link
Member

seberg commented Sep 12, 2022

Way too long, but test_multiarray.py with TestCreation might be an option. Otherwise test_array_coercion.py has somewhat related tests so you could just add it in that file. The only real problem here is that it doesn't belong under array_api/tests, but core/tests

seberg pushed a commit that referenced this pull request Sep 13, 2022
As noted at #21995 (comment), the new test from #21995 was placed in a directory intended for the Array API, and unrelated to the change.

* Consolidate test_dtype_identity into an existing test file.

Remove `test_asarray.py`. Create a new `TestAsArray` suite in
`test_array_coercion.py`

* Linting.

Wrap some comments that got too long after function became
a method (with additional indentation).
@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Sep 16, 2022

This appears to have broken some tests in scipy.sparse. Ultimately, the problem comes down to the dtype check requiring that the dtypes are the same object, instead of being equal in the Python sense.

This shows the essence of the problem:

In [27]: a = np.array([10, 20, 30, 40])

In [28]: b = np.asarray(a, dtype=a.dtype.newbyteorder('native'))

In [29]: a is b  # Before this PR, this would be true.
Out[29]: False

In [30]: a.dtype == b.dtype
Out[30]: True

a.dtype.newbyteorder('native') creates a new dtype instance, but the new dtype is equal (i.e. ==) to a.dtype. So the dtypes of a and b are equal, but asarray() still created a view. There are tests in scipy.sparse that expect a is b to be True.

@eirrgang
Copy link
Contributor Author

a.dtype.newbyteorder('native') creates a new dtype instance, but the new dtype is equal to (i.e. ==) a.dtype. So the dtypes of a and b are equal, but asarray() still created a view. There are tests in scipy.sparse that expect a is b to be True.

Yes. My understanding from @seberg is that this is the intended behavior, and it is reflected in the updates to the numpy tests and release notes. Please advise if you find contradictions in the documentation. Some documentation was trimmed, so you might find the commit history of the PR informative. See also https://github.com/numpy/numpy/pull/21995/files#r922726648

@seberg
Copy link
Member

seberg commented Sep 19, 2022

@WarrenWeckesser, yeah, that was intentional but that doesn't mean it was the best thing to do. Please do followup if you think we should consider undoing (parts) of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
8 participants