You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Currently, the expand_selectors docstring describes its intent as...
Expand a selector to column names with respect to a specific frame or schema target.
It seems like this could mean one of two things:
expand source selection: list the source column names being selected
expand selection result: list the names that result from expressions (like .suffix() applied to selection
I assumed expand_selectors() did the first, but it seems to do the second:
importpolarsasplimportpolars.selectorsascsdf=pl.DataFrame({"x": [], "y": []})
# returns ("x_z",)cs.expand_selector(df, (cs.by_name("x") +pl.col("y")).name.suffix("_z"))
# name directly off selector, also returns ("x_z",)cs.expand_selector(df, (cs.by_name("x").name.suffix("_z"))
Is this a bug or by design? I could see the value of both activities, but being able to expand source selection is very useful for tools that let people choose columns with selectors, so it would be helpful to have source selection somewhere!
Log output
No response
Issue description
Where is listing source selection useful?
For Great Tables, we need to do column selection, without executing computation (e.g. choose columns only). See
For example, in R's dplyr library, there's a function (confusingly named select) that only chooses columns. The way selection and computation are combined is through another function, called across:
library(dplyr)
# tidyselect equivalent of a selector ----# contains("m")# can only choose columns, can not execute computation on them ----
select(mtcars, contains("m"))
# can select, execute computation, create new columns ----
transmute(
mtcars,
# across combines selection, computation, and result naming
across(
contains("m"), # selection~.+1, # computation.names="some_prefix_{.col}"# result naming
)
)
Which brings up good language for talking about selectors, column expressions, and expressions. For example, this comment by @stinodego. I'm not sure if cs.contains("x").name.suffix("_z") has entered expression territory as laid out in his comment?
(IMO the R library dplyr's across doc does a good job of framing these pieces, though in a very different interface, with separate arguments for selection, expression, and naming; here it is ported to siuba, and to ibis)
Looks like a bug to me, but possibly not the one you were thinking of; in both of the above cases I'd expect the function to raise an error, as neither input is a bare/compound selector (which is really what this function is for).
I'll see about fixing that, and then we can think how best to address your requirements - big fan of Great Tables, so let's make sure we can handle this cleanly/consistently 😅
Looks like a bug to me, but possibly not the one you were thinking of; in both of the above cases I'd expect the function to raise an error, as neither input is a bare/compound selector (which is really what this function is for).
This isn't what I was thinking of, but also exactly what I'd want, so is the dream scenario :p.
Thanks for the quick response! The polars integration with Great Tables has really been a game changer!
Checks
Reproducible example
Currently, the expand_selectors docstring describes its intent as...
It seems like this could mean one of two things:
.suffix()
applied to selectionI assumed
expand_selectors()
did the first, but it seems to do the second:Is this a bug or by design? I could see the value of both activities, but being able to expand source selection is very useful for tools that let people choose columns with selectors, so it would be helpful to have source selection somewhere!
Log output
No response
Issue description
Where is listing source selection useful?
For Great Tables, we need to do column selection, without executing computation (e.g. choose columns only). See
how R's dplyr does it
For example, in R's dplyr library, there's a function (confusingly named select) that only chooses columns. The way selection and computation are combined is through another function, called across:
Under the hood, functions like select and across use tidyselect::eval_select(), which returns source column selection names.
Expected behavior
The source column names being selected by selectors.
Installed versions
The text was updated successfully, but these errors were encountered: