Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return type of pipe not consistent with types at run time #908

Closed
davetapley opened this issue Apr 18, 2024 · 7 comments
Closed

Return type of pipe not consistent with types at run time #908

davetapley opened this issue Apr 18, 2024 · 7 comments

Comments

@davetapley
Copy link
Contributor

Describe the bug

Using pipe returns a instance of the type pipe is called on, but the type stubs imply it's the type of the function being applied by pipe.

i.e. if my function returns a DataFrame, but I call pipe on a class which inherits from DataFrame, then at run time I get back the subclass, but the typing implies it's just a vanilla DataFrame.

To Reproduce

Subclass DataFrame per the docs.

Create a pipe function using this signature:

def func(df: DataFrame) -> DataFrame:

Observe that at run time if I use pipe on the subtype, your get back an instance of the subtype, which is nice:

class SubDB(DataFrame):
   ... # required other stuff

sub = SubDF()
sub_f = sub.pipe(func)
type(sub_f) # is SubDF()

But if you hover sub_f in VSCode it's type is DataFrame.

Please complete the following information:

  • OS: Linux
  • OS Version 20.04.6
  • python version 3.11.7
  • version of type checker pyright version 1.1.356, commit 6652c4a8)
  • version of installed pandas-stubs 2.2.1.240316

Additional context

The offending type is the T here, it it should return Self:

def pipe(
self,
func: Callable[Concatenate[Self, P], T],
*args: P.args,
**kwargs: P.kwargs,
) -> T: ...

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 18, 2024

Thanks for the report. PR with tests welcome.

@hamdanal
Copy link
Contributor

The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a DataFrame so this is what you get from pipe. If you want the function func to work with DataFrame AND its subclasses, you have to do something like this:

from typing import TypeVar
from pandas import DataFrame

DataFrameT = TypeVar("DataFrameT", bound=DataFrame)

def func(df: DataFrameT) -> DataFrameT: return df

class SubDF(DataFrame):
   ... # required other stuff

sub = SubDF()
sub_f = sub.pipe(func)
reveal_type(sub_f) # Type of "sub_f" is "SubDF" (Pylance)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 28, 2024

The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a DataFrame so this is what you get from pipe. If you want the function func to work with DataFrame AND its subclasses, you have to do something like this:

I'd still like to see if returning Self would also fix the problem.

@hamdanal
Copy link
Contributor

I'd still like to see if returning Self would also fix the problem.

I don’t know what this means, pipe already uses Self, that’s why the example I gave above works.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 28, 2024

I'd still like to see if returning Self would also fix the problem.

I don’t know what this means, pipe already uses Self, that’s why the example I gave above works.

See the suggestion above. Right now, def pipe() in pandas-stubs/core/generic.pyi is returning T. The suggestion is to change it to return Self .

@hamdanal
Copy link
Contributor

See the suggestion above. Right now, def pipe() in pandas-stubs/core/generic.pyi is returning T. The suggestion is to change it to return Self .

No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:

import pandas as pd
df = pd.DataFrame(data={"A": [1, 2], "B": [3, 4]})

def f(df: pd.DataFrame) -> int:
    return df.size

def g(df: pd.DataFrame) -> pd.Series:
    return df["A"]

res_f = df.pipe(f)
print(res_f, type(res_f))

res_g = df.pipe(g)
print(res_g, type(res_g))

You get:

4 <class 'int'>
0    1
1    2
Name: A, dtype: int64 <class 'pandas.core.series.Series'>

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 28, 2024

No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:

Thanks for your analysis. Your solution at #908 (comment) is how the OP should handle this.

@Dr-Irv Dr-Irv closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants