Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix false negative for used-before-assignment (ExceptHandler) #4791

Merged
merged 3 commits into from Aug 3, 2021

Conversation

david-yz-liu
Copy link
Contributor

Type of Changes

Type
βœ“ πŸ› Bug fix

Description

This deals with the same logical issue as the third point in pylint-dev/astroid#1111: handling the fact that exception variables in ExceptHandlers are only in scope inside the handling block. The way I fixed this was to explicitly filter out the AssignName nodes that were the exception variable in an except clause, in NamesConsumer.get_next_to_consume.

⭐ I also modified NamesConsumer.mark_as_consumed to allow for a name to be "partially consumed"⭐, so that I could piggyback on the existing check for used-before-assignment and still get an unused-variable for the except variable, as shown below:

def f():
    try:
        1 / 0
    except ZeroDivisionError as e:
        pass

    print(e)
$ python -m pylint test.py
************* Module test
test.py:7:10: E0601: Using variable 'e' before assignment (used-before-assignment)
test.py:4:4: W0612: Unused variable 'e' (unused-variable)

But if you don't like that change, I could remove the changes to mark_as_consumed too, and either try to write a special case to get the unused-variable error, or just leave it without checking for that.

As a comment: the main complication I encountered is that the methods currently only look at the first AssignName node, e.g.

https://github.com/PyCQA/pylint/blob/9ae559d202814091896cdd3b95068a12fa1bd94b/pylint/checkers/variables.py#L1027

So I use some new filtering to avoid introducing false positives, like in

def f():
    try:
        1 / 0
    except ZeroDivisionError as e:  # This AssignName comes first but doesn't matter
        pass

    e = 10
    print(e)  # This is defined

That said, pylint's VariablesChecker doesn't do much to handle control flow, so I could also revert the filtering and instead add a special case for ExceptHandler in visit_name directly.

Closes #626

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, amazing fix ! I don't know if the change to consume is the way to go but this is certainely the expected result. Regarding control flow, if I understand correctly we would have to refactor rhe variable checker in order to do it properly ? Or is that a limitation of the visitor pattern used by pylint and there's notre much we can do ?

* Fix false negative for ``used-before-assignment`` when the variable is assigned
in an exception handler, but used outside of the handler.

Closes #626
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it also closes #4391?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it doesn't---the changes don't handle shadowing:

def function():
    try:
        1 / 0
    except ZeroDivisionError as error:
        try:
            1 / 0
        except ZeroDivisionError as error:
            raise Exception("") from error

You can also see that there are tests in tests/functional/u/unused/unused_variable.py:108-158 that still have TODOs in them.

Both occurrences of the as error are part of the same consumer in "to_consume", and they're both removed when the error astroid.Name node in the inner block is visited. I can think of a couple of different ways of fixing this, but none that are particularly easy/elegant. I can point out that we get a similar issue with loop variable shadowing:

def function():
    for i in range(10):
        for i in range(20):
            print(i)

Pylint currently reports a redefined-outer-name error for the inner i loop variable, but does not report unused-variable for the outer i loop variable. The redefined-outer-name check is handled in a separate visit_for method; I could do something similar in visit_excepthandler if you want.

P.S. I'll reply to your general review comment later, need to grab lunch now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your very clear answer !

@david-yz-liu
Copy link
Contributor Author

david-yz-liu commented Aug 3, 2021

@Pierre-Sassoulas so in terms of the overall approach: right now each NamesConsumer supports one lookup per variable. Variable starts in to_consume, then after a Name is visited, the entry gets moved to consumed. Moreover, it's all or nothing; either all AssignNames associated with that variable get moved from to_consume to consumed, or they all stay in to_consume.

My changes allow for some AssignNames to stay in to_consume, but don't change the fact that each variable is looked up once per NamesConsumer, because of the check here:

https://github.com/PyCQA/pylint/blob/161a27199fce6c12411c61042212b7a669d04ee3/pylint/checkers/variables.py#L1026-L1040

This allows for the exception variable e to both be visited and marked as used-before-assignment, but leaves the AssignName in the consumer so that it gets marked as unused-variable.

If we don't want to change to_consume in this way, another way to detect unused-variable would be to define visit_excepthandler/leave_excepthandler to look for unused exception variables as a special case, similar to what visit_for does. This keeps the main algorithm (visit_name) simpler, but might increase the overall running time. (The brute force check would be something like any(n.name == except_handler.name for n in except_handler.nodes_of_class(astroid.Name)). Don't know if we could do much better.) Or again, could just not check for unused-variable in this case at all for this PR.


Regarding control flow, I think the main limitation is the fact that the variable checkers rely on NamesConsumer's to_consume/consumed states, and these are updated "all-or-nothing" per variable, with additional special-case checks here and there. To give a simpler example, pylint doesn't report any error on:

def function():
    x = 10
    del x
    print(x)

function()

I don't know the history of the Variables checker, but it seems to me like pylint will keep running into issues around control flow as long as this basic approach is used. I wouldn't say that this is a limitation of the AST visitor pattern though. I think improvements could be made by doing any of (in order of increasing code complexity and running time, but also correctness IMO):

  1. Modifying NamesConsumer so that variables can be processed more than once.
  2. Using astroid's variable lookup functionality, which handles control flow better. [1]
  3. Building a control flow graph and then running VariablesChecker checks on the graph rather than the AST itself.

But all of these would be really significant changes so I'm not exactly advocating for them, just thought I'd mention. Personally I'm still very happy using pylint with the current functionality!


[1] Example:

code = """
def function():
    x = 10
    del x
    print(x)
"""
import astroid
ast = astroid.parse(code)
n = [n for n in ast.nodes_of_class(astroid.Name) if n.name == "x"][0]
print(n.lookup('x'))  # Correctly doesn't return any AssignNames

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.001%) to 92.577% when pulling a547566 on david-yz-liu:issue-626 into 85b69c9 on PyCQA:main.

@Pierre-Sassoulas Pierre-Sassoulas added Control flow Requires control flow understanding Bug πŸͺ² labels Aug 3, 2021
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.10.0 milestone Aug 3, 2021
@Pierre-Sassoulas
Copy link
Member

Thank you for your insightful answer. If we want to implement control flow at some point we'll need that kind of insight and I can't attain it while tending to the constant fire of new issues and pull requests, so I appreciate that very much. Maybe I could create a "control flow" issue referencing the possibilities you gave so it become the reference when the issue of control flow comes up ? This is a problem that appear often and I have no solution but hacks, personally.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘

@Pierre-Sassoulas Pierre-Sassoulas merged commit 741276e into pylint-dev:main Aug 3, 2021
@david-yz-liu
Copy link
Contributor Author

Sure, that sounds fine to me. (Honestly I feel like it's your and the other pylint maintainers' call about whether to pursue this or not!!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug πŸͺ² Control flow Requires control flow understanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pylint doesn't understand the scope of except clause variables
3 participants