Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: rename 'dtype_backend' #58214

Closed
jbrockmendel opened this issue Apr 10, 2024 · 9 comments
Closed

DEPR: rename 'dtype_backend' #58214

jbrockmendel opened this issue Apr 10, 2024 · 9 comments
Labels
API Design Bug Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 10, 2024

This came up in #58141. Discussed briefly at the sprint in August.

I've seen some user confusion [citation needed] stemming from the term "backend" in the "dtype_backend" parameter. It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.

I think we should move away from "backend", renaming the dtype_backend parameter where applicable (with a deprecation cycle where appropriate). Maybe dtype "family"?

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 10, 2024
@rhshadrach
Copy link
Member

I like family. I also like flavor, no more or less. Some more alternatives I don't like (but others might): system, set, base, strain.

@rhshadrach rhshadrach added API Design Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 10, 2024
@lithomas1
Copy link
Member

I've seen some user confusion [citation needed] stemming from the term "backend" in the "dtype_backend" parameter. It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.

Is there evidence that users would not be confused if it was called e.g. dtype_family?

I feel like this is something that would happen eventually as long as the numpy/arrow dtypes shared names (e.g. "int64" vs "int64[pyarrow]").

@jbrockmendel
Copy link
Member Author

Is there evidence that users would not be confused if it was called e.g. dtype_family?

I don't understand the question. We haven't used any other terms... "backend" has connotations of swappability and an invariant frontend that wouldn't apply to other terms.

@lithomas1
Copy link
Member

I'm asking since renaming a parameter causes a lot of code churn.

For me, personally, it is not clear what a dtype family or flavor is, while dtype backend gives me the understanding that the underlying arrays backing my Series/DataFrame is arrow/numpy/whatever. So, IMO, dtype_backend is more clear than the other terms.

I've seen some user confusion [citation needed] stemming from the term "backend" in the "dtype_backend" parameter. It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.

I guess the [citation needed] part was what I was asking for in my previous question. If you could dig that up, that'd be really helpful.

@jbrockmendel
Copy link
Member Author

I'm asking since renaming a parameter causes a lot of code churn.

Totally reasonable concern. My thought is that ATM this is used relatively little, so is easier to change than it would be after #58141 and related.

I guess the [citation needed] part was what I was asking for in my previous question. If you could dig that up, that'd be really helpful.

Also fair. I think there was a lot of confusion surfaced in https://www.reddit.com/r/Python/comments/11fio85/we_are_the_developers_behind_pandas_currently/ about what "backend" means. I remember other things on hackernews that I'm not inclined to dig up. Searching our issues for "backend" i see #53154 has a user expecting identical behavior. I'll update this as I find more of these, as I think "incorrectly expecting identical behavior" is a common complaint.

@mroeschke
Copy link
Member

Is there evidence that users would not be confused if it was called e.g. dtype_family?

I also initially agree with @lithomas1's question here. I'm not fully convinced (yet) that renaming a keyword argument would be able to convey "pick a dtype implementation that is not fully equivalent to the other options". I am open to there being a better term though.

@jbrockmendel
Copy link
Member Author

#58307 another case of incorrectly expecting identical behavior

@jorisvandenbossche
Copy link
Member

It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.

Personally, I think this is actually the correct impression. It's how I think most users should think about the backends (so in that sense I don't have a problem with the current naming).

I know that in practice this of course not correct in all cases right now, but it could be what we want it to be eventually. And so whenever we get a report about different behaviours, it might be something we should fix.

It's something that we should discuss and spell out, tough, what we generally think the expectations should be about those different backends (maybe as part of the PDEP discussion in #58455)

@jbrockmendel
Copy link
Member Author

Reading the room, I'm going to learn to live with users continuing to be confused by this name. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants