Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Overflow warning when using Series.diff() on Series with Periods #58447

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/whatsnew/v3.0.0.rst
Expand Up @@ -416,15 +416,15 @@ MultiIndex

I/O
^^^
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping`` elements. (:issue:`57915`)
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
- Bug in :meth:`DataFrame.to_dict` raises unnecessary ``UserWarning`` when columns are not unique and ``orient='tight'``. (:issue:`58281`)
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)

Period
^^^^^^
-
- Bug in :meth:`Series.diff` showing ``RuntimeWarning`` of overflowing when the Series has :class:`Period` (:issue:`58320`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I mention this happens on Windows only?

-

Plotting
Expand Down
8 changes: 5 additions & 3 deletions pandas/core/arrays/datetimelike.py
Expand Up @@ -1303,16 +1303,18 @@ def _sub_periodlike(self, other: Period | PeriodArray) -> npt.NDArray[np.object_
self._check_compatible_with(other)

other_i8, o_mask = self._get_i8_values_and_mask(other)
new_i8_data = add_overflowsafe(self.asi8, np.asarray(-other_i8, dtype="i8"))
new_data = np.array([self.freq.base * x for x in new_i8_data])

new_data = add_overflowsafe(
self.asi8, np.asarray(-other_i8, dtype="i8")
).astype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a great solution if this always needs to be cast to object

Copy link
Member Author

@Aloqeely Aloqeely Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? The comment says "we return an object-dtype ndarray" so it makes sense to cast it to object.
I thought this is better than creating a new array which will be object dtype (old behavior)

I could add an optional argument dtype to add_overflowsafe, would that be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this seems suboptimal, and that comment seems pretty far detached from where the object conversion happens right? I agree this is a bit suspect; the old code seems to use primitive types so overall this is adding a good deal of overhead.

Do you know where the cast to object is actually occurring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_data = np.array([self.freq.base * x for x in new_i8_data])

Here, self.freq.base is what causes new_data to be object dtype

if o_mask is None:
# i.e. Period scalar
mask = self._isnan
else:
# i.e. PeriodArray
mask = self._isnan | o_mask

new_data[mask] = NaT
new_data[~mask] *= self.freq.base
return new_data

@final
Expand Down
8 changes: 1 addition & 7 deletions pandas/tests/extension/test_period.py
Expand Up @@ -25,8 +25,6 @@
Period,
iNaT,
)
from pandas.compat import is_platform_windows
from pandas.compat.numpy import np_version_gte1p24

from pandas.core.dtypes.dtypes import PeriodDtype

Expand Down Expand Up @@ -104,11 +102,7 @@ def check_reduce(self, ser: pd.Series, op_name: str, skipna: bool):

@pytest.mark.parametrize("periods", [1, -2])
def test_diff(self, data, periods):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole test function can be removed if its just calling super

if is_platform_windows() and np_version_gte1p24:
with tm.assert_produces_warning(RuntimeWarning, check_stacklevel=False):
super().test_diff(data, periods)
else:
super().test_diff(data, periods)
super().test_diff(data, periods)

@pytest.mark.parametrize("na_action", [None, "ignore"])
def test_map(self, data, na_action):
Expand Down