Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conventions for PySpark dataframe typehints #12

Open
harrietrs opened this issue Oct 25, 2022 · 1 comment
Open

Conventions for PySpark dataframe typehints #12

harrietrs opened this issue Oct 25, 2022 · 1 comment

Comments

@harrietrs
Copy link

When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g.

from pyspark.sql import DataFrame
import pyspark.pandas as ps

def my_function(my_dataframe: DataFrame) -> ps.DataFrame:
    return my_dataframe.toPandas()

However the above doesn't clearly distinguish between the different data types. Perhaps an alias for the pyspark.sql.DataFrame is required- although I'm not sure of how to make it different from ps.DataFrame (an established alias).

@fzhem
Copy link

fzhem commented Oct 26, 2022

I have encountered this before and I do the following:
from pyspark.sql import DataFrame as SparkDataFrame
Maybe a bit descriptive but it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants