Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest partition() or rpartition() over specific uses of split() #4440

Closed
alanyee opened this issue May 6, 2021 · 20 comments · Fixed by #4469
Closed

Suggest partition() or rpartition() over specific uses of split() #4440

alanyee opened this issue May 6, 2021 · 20 comments · Fixed by #4469
Assignees
Labels
Enhancement ✨ Improvement to a component Good first issue Friendly and approachable by new contributors Help wanted 🙏 Outside help would be appreciated, good for new contributors
Milestone

Comments

@alanyee
Copy link

alanyee commented May 6, 2021

Is your feature request related to a problem? Please describe

Oftentimes, I see such snippets:

seq = '1,2,3'
get_first = seq.split(',')[0]
get_last = seq.split(',')[-1]

Since the developer only cares about the first and/or last parts of the split, the better alternative is to use partition() and rpartition() such as

seq = '1,2,3'
get_first = seq.partition(',')[0]
get_last = seq.rpartition(',')[2] # -1 index also works here

partition() and rpartition() only splits the string once, so they become equivalent to split(sep=",", maxsplit=1) and rsplit(sep=",", maxsplit=1). This avoids having split() split a string multiple times as a given string may have multiple copies of the sep value in the string when the developer only cares about the first or last part of the string.

Describe the solution you'd like

Within a given scope of Python code, Pylint should check for similar uses of

seq.split(',')[0]
seq.split(',')[-1]

where seq is only referenced and used for the first and/or last split, and Pylint should throw a warning about a potentially unnecessary, expensive operation.

Additional context

N/A

@Pierre-Sassoulas Pierre-Sassoulas added Good first issue Friendly and approachable by new contributors Enhancement ✨ Improvement to a component Help wanted 🙏 Outside help would be appreciated, good for new contributors labels May 6, 2021
@Pierre-Sassoulas
Copy link
Member

Sounds reasonable, I'd merge that.

@yushao2
Copy link
Collaborator

yushao2 commented May 12, 2021

Giving this a shot

@yushao2
Copy link
Collaborator

yushao2 commented May 12, 2021

@alanyee (CC: @Pierre-Sassoulas & @cdce8p )

Would it make sense to show a warning when .split(sep=. . ., maxsplit=1) is used?

@alanyee
Copy link
Author

alanyee commented May 12, 2021

@alanyee (CC: @Pierre-Sassoulas & @cdce8p )

Would it make sense to show a warning when .split(sep=. . ., maxsplit=1) is used?

@yushao2

I don't think it would make sense. In my opinion, stylistically, partition() would be more Pythonic there, but perhaps the developers left in an explicit maxsplit because the maximum number of splits they do may change in the future or perhaps some other use case i.e. the return type of split() and partition() are different.

@yushao2
Copy link
Collaborator

yushao2 commented May 12, 2021

@yushao2

I don't think it would make sense. In my opinion, stylistically, partition() would be more Pythonic there, but perhaps the developers left in an explicit maxsplit because the maximum number of splits they do may change in the future or perhaps some other use case i.e. the return type of split() and partition() are different.

I understand, i was thinking of throwing a suggestion to use (r)partition instead of split when a user is using .split() and with a maxsplit value of Constant int 1 -- since partition accomplishes the same thing. I'll leave that check out for now based on your inputs. Thanks!

@cdce8p
Copy link
Member

cdce8p commented May 19, 2021

#4469 is getting close to being finished. However, lately I was thinking if this should be an extension instead. It is a matter of preference after all. What do you think? @alanyee @Pierre-Sassoulas

@Pierre-Sassoulas
Copy link
Member

I think this can be a builtin checker as calculating only the first or last element of the list gives better performance at no cost, so this does not feel like an opinionated checker. Maybe I'm mistaken ?

@cdce8p
Copy link
Member

cdce8p commented May 19, 2021

I think this can be a builtin checker as calculating only the first or last element of the list gives better performance at no cost, so this does not feel like an opinionated checker, but maybe I'm mistaken ?

I'm 50/50 on that. So if the most think it can be builtin, we can do that.

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented May 19, 2021

What would be the drawbacks ? (rpartition is less known so the code is harder to understand ?)

@cdce8p
Copy link
Member

cdce8p commented May 19, 2021

What would be the drawbacks ? (rpartition is less known so the code is harder to understand ?)

Yeah. Might be just me, but I for example though that this would not be equal (until I tested it):

l = "1,2,3"
a = l.split("-")[-1]
b = l.rpartition("-")[2]
a == b

When thinking about it now, it does make sense though.

With split and rsplit I guess everybody knows what to expect immediately.

@Pierre-Sassoulas
Copy link
Member

Ok, so if I understand correctly the problem is the suggestion we make and not the checker itself. What should be the less opinionated suggestion we can ? split or rsplit using the max_split parameter as it still return a list instead of a tuple ?

@cdce8p
Copy link
Member

cdce8p commented May 19, 2021

Ok, so if I understand correctly the problem is the suggestion we make and not the checker itself.

I couldn't put it in words first, but that describes it pretty much.

What should be the less opinionated suggestion we can ? split or rsplit using the max_split parameter as it still return a list instead of a tuple ?

The could actually be a good solution! Instead of suggestion partition, suggest the use of maxsplit=1 combined with rsplit if necessary.

@yushao2 What do you think?

@yushao2
Copy link
Collaborator

yushao2 commented May 20, 2021

That makes sense, it might be more intuitive as compared to suggesting partition due to the difference in return type/formats.

Should I go ahead and make the necessary changes in the checker in my PR?

@cdce8p
Copy link
Member

cdce8p commented May 20, 2021

That makes sense, it might be more intuitive as compared to suggesting partition due to the difference in return type/formats.

Lets go that route then!

Should I go ahead and make the necessary changes in the checker in my PR?

👍🏻

@yushao2
Copy link
Collaborator

yushao2 commented May 20, 2021

I think this can be a builtin checker as calculating only the first or last element of the list gives better performance at no cost, so this does not feel like an opinionated checker, but maybe I'm mistaken ?

I'm 50/50 on that. So if the most think it can be builtin, we can do that.

Hi guys! Just like to ask a small question -- what does builtin refer to specifically in this context?

@Pierre-Sassoulas
Copy link
Member

Hello :)

builtin means it would be a default check in pylint, if it's not it would need to be activated in the pylintrc, basically only available for very informed users that really want to have all possible checks.

@Pierre-Sassoulas
Copy link
Member

Just a last minute thought, as it's non opinionated and always better to use max_split we could maybe rename the check to use-maxsplit-arg and remove the consider. We already did that to use-a-generator and consider-using-a-generator. What do you think @yushao2 ?

@yushao2
Copy link
Collaborator

yushao2 commented May 21, 2021

Just a last minute thought, as it's non opinionated and always better to use max_split we could maybe rename the check to use-maxsplit-arg and remove the consider. We already did that to use-a-generator and consider-using-a-generator. What do you think @yushao2 ?

I just checked the difference between the two and since it's non-opinionated, I agree that it should be renamed to use-maxsplit-arg -- since this change would always yield a performance benefit.

Is it fine if I leave the logic in this source file, recommendation_checker.py, or should i refactor it into somewhere else?

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented May 21, 2021

I don't think it would be worth the time, it's still a recommendation, we just have higher confidence that it's a good one (unlike some other check where the user have to consider the situation).

@yushao2
Copy link
Collaborator

yushao2 commented May 21, 2021

I don't think it would be worth the time, it's still a recommendation, we just have higher confidence that it's a good one (unlike some other check where the user have to consider the situation).

Let me make the changes then :)

@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.9.0 milestone Jun 7, 2021
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Aug 2, 2021
Since Pylin 2.9.0 there is new check:
> New checker use-maxsplit-arg. Emitted either when accessing only the
first or last element of str.split().

See pylint-dev/pylint#4440 for details.
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Aug 3, 2021
Since Pylint 2.9.0 there is new check:
> New checker use-maxsplit-arg. Emitted either when accessing only the
first or last element of str.split().

See pylint-dev/pylint#4440 for details.
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Aug 3, 2021
Since Pylint 2.9.0 there is new check:
> New checker use-maxsplit-arg. Emitted either when accessing only the
first or last element of str.split().

See pylint-dev/pylint#4440 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Good first issue Friendly and approachable by new contributors Help wanted 🙏 Outside help would be appreciated, good for new contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants