Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Simplify map_infer_mask #58483

Merged
merged 5 commits into from May 10, 2024
Merged

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Apr 30, 2024

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This is needed for upcoming numpy string dtype support (getting rid of the ndarray[object] annotation).

We might also get a decent speedup from using numpy C API functions (but probably not).

EDIT: Looks like benchmarks aren't significantly changed.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good generally

pandas/_libs/lib.pyx Show resolved Hide resolved
@lithomas1 lithomas1 marked this pull request as ready for review April 30, 2024 19:48
@lithomas1 lithomas1 added this to the 3.0 milestone Apr 30, 2024
@@ -627,28 +627,25 @@ def _str_map(
na_value = np.nan
else:
na_value = False
try:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl

If you're free, could you review the changes to the Arrow strings here?

I think what I have here is correct, but not too familiar with the arrow strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke

Are you able to review this bit instead?

I'd like to get this in soon.

*,
bint convert=True,
object na_value=no_default,
object dtype=np.dtype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? is the new numpy string dtype not an instance of cnp.dtype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an accidental change I forgot to roll back.

In general, is there a reason we prefer cnp.dtype?

With cnp.dtype I find it harder to do stuff like kind checks on the dtype object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, is there a reason we prefer cnp.dtype?

Just the usual "explicit is better than implicit" reasons; if it is object here then in 6 months I'll have to check whether it can be an ExtensionDtype

bint convert=True,
object na_value=no_default,
object dtype=np.dtype(object)
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning None looks inaccurate. am i reading this wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i mixed up the annotations from the old _map_infer_mask

@mroeschke mroeschke merged commit 24182c2 into pandas-dev:main May 10, 2024
46 checks passed
@mroeschke
Copy link
Member

Thanks @lithomas1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants