-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering using boolean indexes? #20
Comments
You can do it with idx = df.logMessage.map(lambda x: bool(re.search('"success": true', x)))
df = df >> query('@idx')
You could also do it using the df = df >> call('__getitem__', df.logMessage.map(lambda x: bool(re.search('"success": true', x)))) Maybe the solution is to add a switch to |
Are there more convenient methods to use Regex in filtering rows,a.k.a. in function "query". While in R , |
There isn't a way to filter using a regex.
Yes it does not work. |
Thanks for reply. |
The function query is just not that convenient since I get used to the 'filter' in R. Just a personl wish,hhh. |
@antonio-yu, can you come up with a short specification/example of how you would expect regex filtering to work. Then we can start from there. |
@has2k1 ,hi, Hassan. Here is an example :
Frist,I wanna select these rows in which
Some different methods to do it by using regular expressions .
While in pandas and plydata ,the syntax I also tried the silimar package Normally, I need to filter rows by regrex and then do all the further operations , like group and aggregation,in chainable way. In pandas and plydata , the regrex doesn't work in function 'query ' The key point is that if one chain breaks ,the pip opreration stops. Regrex is always necessary when selecting rows and columns ,especilly in our chinese sentences. I wish the Hope that I were clear 😄. Best wishs! |
There are the query_all, query_at and query_if helpers, but I admit they are not easy to think of. df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']
})
df >> query_at('y', any_vars='{_}.str.contains("o")')
"""
x y
0 0 zero
1 1 one
2 2 two
"""
# However in this case since we are querying a single column we do not need
# to use the '{_}' placeholder for the column name.
df >> query_at('y', any_vars='y.str.contains("o")')
"""
x y
0 0 zero
1 1 one
2 2 two
"""
I got confused by what you meant by "regrex", I thought you meant regular expression. |
I think it merits a second function. Now, what to call it |
Sorry, I wrote the 'regrex' wrongly,I meant 'regex'. 😂 The functions query_all,query_at,query_if are equivalent to functions filter_all, filter_at, filter_if in R, right? They have the same logic. But in plydata, these three functions always need to select some columns and an argument
If there is a new functon , that allows me to write the code
The query itself stays the same with panda DataFrame.query. |
Hi , @has2k1 ,how is everything going.
|
You can use a regular expression df >> select(matches='[^a]$') |
Thank you very much for developing and maintaining plydata. It makes pandas usable for me.
In analysing data from some logs, I wanted to filter to rows which matched a regex. I ended up using pandas' boolean indexes:
Is there a more plydata way to do that? I would have expected a
filter
verb or an overload ofquery
The text was updated successfully, but these errors were encountered: