Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make packages (pandas) optional #374

Open
eitsupi opened this issue Apr 29, 2024 · 4 comments
Open

Make packages (pandas) optional #374

eitsupi opened this issue Apr 29, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@eitsupi
Copy link
Member

eitsupi commented Apr 29, 2024

I would like to change all dependencies except prqlc to optional so that we can install only what we need like pyprql[pandas] or pyprql[jupyter].

In many cases pandas are no longer needed, especially when polars are added to the dependency by #373.

@eitsupi eitsupi added the enhancement New feature or request label Apr 29, 2024
@eitsupi eitsupi changed the title [BREKING] Make packages (pandas, jupysql, ...) optional [BREAKING] Make packages (pandas, jupysql, ...) optional Apr 29, 2024
@max-sixty
Copy link
Member

One problem with using optional packages in pip is that they're quite rarely used — unlike rust. So IME most users aren't familiar with how to install them / that they should check which set of dependencies they want.

We do have prqlc itself if folks only want the compiler.

If there is a large enough use case for splitting the dependencies here, then OK, but otherwise I would leave away from it. Maybe pandas is that? But also most installations would have pandas anyway...

@eitsupi
Copy link
Member Author

eitsupi commented Apr 29, 2024

Just as great_tables recently removed both pandas and polars from its required dependencies (technically, it removed pandas, which was once a required dependency, to support polars-only installation), polars users may not want to install pandas.

Currently pyprql pulls pandas and duckdb, while these are completely unnecessary for users who want to use only polars.

As like the discussion of making pyarrow a required dependency of pandas seems to be (pandas-dev/pandas#57073), I think packages with huge binaries tend to be shunned.

@eitsupi
Copy link
Member Author

eitsupi commented Apr 29, 2024

Certainly jupysql and duckdb are worth keeping for now, but pandas is really unnecessary.
Users who need pandas should already have it installed and prql.pandas_accessor can only be used if an instance of pandas.DataFrame is created using pandas in the first place.

@eitsupi eitsupi changed the title [BREAKING] Make packages (pandas, jupysql, ...) optional [BREAKING] Make packages (pandas) optional Apr 29, 2024
@max-sixty
Copy link
Member

Users who need pandas should already have it installed and prql.pandas_accessor can only be used if an instance of pandas.DataFrame is created using pandas in the first place.

This is a very good point! Because this augments pandas' functionality but doesn't otherwise require pandas to work, we could even remove it from the dependencies all together. Then if someone has pandas / polars installed, this library augments that, otherwise it doesn't interfere by installing anything.

(We'd still have them in dev dependencies so tests can run etc. And OK if you prefer to have them as optional dependencies)

@eitsupi eitsupi changed the title [BREAKING] Make packages (pandas) optional Make packages (pandas) optional Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants