Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

len-as-sequence with numpy.arrays #1879

Closed
nschloe opened this issue Feb 12, 2018 · 7 comments
Closed

len-as-sequence with numpy.arrays #1879

nschloe opened this issue Feb 12, 2018 · 7 comments
Assignees
Labels
Bug 🪲 False Positive 🦟 A message is emitted but nothing is wrong with the code

Comments

@nschloe
Copy link

nschloe commented Feb 12, 2018

For

import numpy

a = numpy.array([0])

if len(a) > 0:
    print('a')

if a:
    print('b')

pylint suggests

C:  5, 3: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)

However, there is a difference between those two conditions above; that's because numpy.array([0]) evaluates to False. (As opposed to numpy.array([1.121]), for example.)

Now, pylint cannot know everything about all Python modules, but numpy is so popular and the above behavior so devious that I'm wondering if len-as-condition should be removed entirely. In any case, the warning message should explicitly except numpy arrays.

@brycepg
Copy link
Contributor

brycepg commented Feb 12, 2018

Another issue with len-as-condition is with pandas which actually throws a ValueError when you do if pd.Series(): pass but if len(pd.Series()): pass is fine

I'm thinking pylint should only apply this check for builtin containers such as dictionaries and lists and sets

@PCManticore
Copy link
Contributor

Feel free to disable the rule locally for such constructs. If a rule works most of the time but there are libraries such as numpy that decided to have a different behaviour, then this is not an argument against the rule, but against outliers in behaviour such as these libraries.

@nschloe
Copy link
Author

nschloe commented Feb 12, 2018

If a rule works most of the time but there are

That's what I meant: Numpy isn't just "a library" -- it's so immensely popular that I have problems naming a Python software that doesn't use it. One can disable the rule alright if you know about the behavior, but I'd argue that most people don't. Suggesting to replace len(arr)>0 without any warning about numpy might do the community a disservice.

@PCManticore
Copy link
Contributor

To be fair, pylint doesn't recommend itself as being one size fits it all. If something does not work for a particular library, it's okay to just disable the check in that area of the code of across the entire project. Supporting numpy's idiosyncrasies might be worth it, but this can get easily out of hand if we're talking about adapting every rule for every popular library out there. Also adapting a check is a completely different topic than completely removing it, as your initial comment suggested, I wouldn't go as far as removing checks just for satisfying irregularities of other software.

@bersbersbers
Copy link

bersbersbers commented Aug 26, 2020

Sorry to revive an old discussion, but I guess this is better than opening a new issue. This was the MWE I intended to post:

"""Demonstrate inappropriate len() warning for Pandas data frames warning."""
import pandas as pd

df = pd.DataFrame()

if len(df):
    print("this works, but pylint tells me not to use len() without comparison")

if len(df) > 0:
    print("this works and pylint likes it, but it's not the solution intended by PEP-8")

if df:
    print("this does not work (truth value of dataframe is ambiguous)")

On additional argument (to the ones pointed out above) would be that PEP 8 says

For sequences, (strings, lists, tuples), use the fact that empty sequences are false:

So while pylint's behavior may be in line with PEP 8 for these types, I don't see PEP 8 recommend applying the same rule to other types, so pylint may be extending its unnecessarily to other libraries. A point to support this may be that the list of sequence types is well-defined elsewhere: https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
So one solution would be to enable the warning only for variables that pylint is certain are of the above type. But I can see why one would not want that.

Another solution would be for pylint to disable the check if the object has a __bool__() method, which is a pretty strong indicator for "the fact that empty sequences are false" to be wrong (because if that was intended, the developer could have achieved that by implementing only a __len___() method). This way, pylint would apply the warning to as many types as reasonably possible; and would be able to not show a warning for if len(df) or if len(array) because it understands that if df and if array mean something else, and because it knows that if len(x) > 0 is not intended. Would this be doable?

@Pierre-Sassoulas
Copy link
Member

Hello, thank you for bringing new arguments to the table @bersbersbers.

I'm reopening for three reasons:

  • It is true that numpy/pandas are very popular, so I suppose that this is a false positive that is affecting a lot of users. And we want to make pylint friendlier for newcomers with less false positives as discussed in Disable certain messages by default #3512.
  • Disabling len-as-condition is certainly not something that I want to do as it make the code a lot clearer in code base from ex-C/C++ developers
  • Even if we don't want to maintain specific libraries (that would be a slippery slope), the proposed solution of checking for an implementation of __bool__ is not specific to numpy and seems sensible.

This might require change in astroid for libs that use system libraries like numpy.

@Pierre-Sassoulas
Copy link
Member

Closing, there should no longer false positive following #3821. However, generator and list comprehension inside functions are now "false negative" that would require an evolution in astroid to be fixed, the follow up issue is: #4002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🪲 False Positive 🦟 A message is emitted but nothing is wrong with the code
Projects
None yet
Development

No branches or pull requests

5 participants