Skip to content

False positive about @cache on Enum methods #7857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RumovZ opened this issue Nov 27, 2022 · 14 comments · Fixed by #7908
Closed

False positive about @cache on Enum methods #7857

RumovZ opened this issue Nov 27, 2022 · 14 comments · Fixed by #7908
Labels
False Positive 🦟 A message is emitted but nothing is wrong with the code Needs PR This issue is accepted, sufficiently specified and now needs an implementation
Milestone

Comments

@RumovZ
Copy link

RumovZ commented Nov 27, 2022

Bug description

# pylint: disable=missing-docstring
from enum import Enum
from functools import cache

class Class(Enum):
    A = 1

    @cache
    def func(self) -> None:
        pass

Configuration

No response

Command used

pylint demo.py

Pylint output

W1518: 'lru_cache(maxsize=None)' or 'cache' will keep all method args alive indefinitely, including 'self' (method-cache-max-size-none)

Expected behavior

Pylint shouldn't complain about this, the code is perfectly fine.
Repeated calls to Class(1).func() will not grow the cache, nor prevent anything from being garbage collected.

Pylint version

pylint 2.15.6
astroid 2.12.13
Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

OS / Environment

No response

Additional dependencies

No response

@RumovZ RumovZ added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Nov 27, 2022
@DanielNoord
Copy link
Collaborator

As far as I know the behaviour isn't different for Enums but I might be wrong here. Is there documentation that using cache in an Enum works differently?

@RumovZ
Copy link
Author

RumovZ commented Nov 27, 2022

From the side of cache, there is no difference in behaviour, so there wouldn't be any documentation. What makes this special is that Class(1) is a singleton. This modified example demonstrates it:

# pylint: disable=missing-docstring
from enum import Enum
from functools import cache


class EnumClass(Enum):
    A = 1

    @cache
    def func(self, text: str) -> None:
        print(text)


class Class:
    @cache
    def func(self, text: str) -> None:
        print(text)


EnumClass(1).func("printed once")
EnumClass(1).func("printed once")

Class().func("printed twice")
Class().func("printed twice")

@RumovZ
Copy link
Author

RumovZ commented Nov 27, 2022

Or maybe an even clearer demonstration:

# pylint: disable=missing-docstring
from enum import Enum
from functools import cache
from typing import Any


class EnumClass(Enum):
    A = 1


class Class:
    pass


@cache
def func(_: Any, text: str) -> None:
    print(text)


func(EnumClass(1), "printed once")
func(EnumClass(1), "printed once")

func(Class(), "printed twice")
func(Class(), "printed twice")

@DanielNoord
Copy link
Collaborator

DanielNoord commented Nov 27, 2022

Both of these examples don't really show what this message is warning against. Defining the __del__ on both classes does show a difference in behaviour though.

I think that Enums are not garbage collected anyway so there is not really a risk of keeping extra things in memory. (I think) this can be shown by doing python -i file.py on the following files.

from enum import Enum
from functools import cache


class A:
    def __del__(self):
        print("A.__del__")

    def func(self, x):
        print("A.func")


class B:
    def __del__(self):
        print("B.__del__")

    @cache
    def func(self, x):
        print("B.func")


class C(Enum):
    A = 1

    def __del__(self):
        print("C.__del__")

    def func(self, x):
        print("C.func")


class D(Enum):
    A = 1

    def __del__(self):
        print("D.__del__")

    @cache
    def func(self, x):
        print("D.func")


A().func(1)
B().func(1)
C(1).func(1)
D(1).func(1)
python -i test.py
A.func
A.__del__
B.func
C.func
D.func
>>> 
C.__del__
D.__del__
B.__del__

B shows what this warning is telling you about: classes don't get garbage collected because self is kept in memory. However, it seems like normal Enums don't get GC'ed anyway so we might want to disable this message for all methods within Enums?

@RumovZ
Copy link
Author

RumovZ commented Nov 27, 2022

Both of these examples don't really show what this message is warning against.

I assumed this message warns against an indefinitely growing cache, and my example shows that the cache entries for an enum are finite.

This might be stuff for another issue, but I ignored the actual wording of the message, because it doesn't make any sense to me:

  1. 'lru_cache(maxsize=None)' or 'cache' will keep all method args alive indefinitely, including 'self' (method-cache-max-size-none) Duh!
  2. Putting a maxsize on it doesn't imply args ever get garbage collected.
  3. Why doesn't it also warn against this then, which does exactly the same?

@DanielNoord
Copy link
Collaborator

I assumed this message warns against an indefinitely growing cache, and my example shows that the cache entries for an enum are finite.

Like the documentation shows this is not the case. It is not about an infinitely large cache, but about the garbage collection of the stuff that is in the cache.

1. `'lru_cache(maxsize=None)' or 'cache' will keep all method args alive indefinitely, including 'self' (method-cache-max-size-none)` Duh!

How would we rephrase this then? Does the extended description in the documentation help?

3. Why doesn't it also warn against [this](https://github.com/PyCQA/pylint/issues/7857#issuecomment-1328348813) then, which does exactly the same?

The issue warns about decorating methods since it is unconsidered unexpected behaviour that self will never be garbage collected. This means that any instances will never be garbage collected and therefore that the caches of those instances will never be garbage collected.
Obviously the same happens with module scope function but there the behaviour is more like you would expect it to be.

@RumovZ
Copy link
Author

RumovZ commented Nov 28, 2022

If I annotate a function with lru_cache(maxsize=None) I expect input and output to be cached indefinitely. (I might not be aware of cache's default behaviour, but that's another matter.) The fact that one argument is called self doesn't change that.
Also, if this is about garbage collection and not unbound growth, I don't understand why setting a maxsize would silence the warning.

The example in the documentation is contrived, because self is passed into the method without being used, which is what Pylint should actually warn against. However, in real code all passed in arguments are necessary to compute the output, so why warn against it?

@DanielNoord
Copy link
Collaborator

If I annotate a function with lru_cache(maxsize=None) I expect input and output to be cached indefinitely. (I might not be aware of cache's default behaviour, but that's another matter.)

In that case it might be better to explicitly disable the warning then. Not all messages fit all Python code.

The fact that one argument is called self doesn't change that. Also, if this is about garbage collection and not unbound growth, I don't understand why setting a maxsize would silence the warning.

It's not based on whether the argument is called self but whether it is a class instance.

The example in the documentation is contrived, because self is passed into the method without being used, which is what Pylint should actually warn against. However, in real code all passed in arguments are necessary to compute the output, so why warn against it?

In documentation examples we always focus on issue. Although other pylint warnings might be raisable we don't show those to allow us to focus the example on a specific message.

It feels as if you're aware of the behaviour of lru_cache and might be better off disabling the message. If you don't care about keeping stuff in memory indefinitely this is indeed not useful for you.

@Pierre-Sassoulas
Copy link
Member

I thought the warning was about both issues combined (growing cache and no garbage collection) ? As we say in the doc:

Unless your instance will never need to be garbage collected (singleton) it is recommended to refactor code to avoid this pattern or add a maxsize to the cache.

We don't have an easy way to detect that a class is a singleton (there's no official way to make a class a singleton) so we accepted that there could be some false positives. But if enums are known singletons we should not raise afaiu.

@RumovZ
Copy link
Author

RumovZ commented Nov 28, 2022

It's not based on whether the argument is called self but whether it is a class instance.

Without knowing the code, it very much seems like the warning is only triggered for functions, not methods, and doesn't depend on the kind of arguments.

In documentation examples we always focus on issue. Although other pylint warnings might be raisable we don't show those to allow us to focus the example on a specific message.

My point is that the raised warning is merely a consequence of the actual issue.
The "Problematic code" is problematic because the cache depends on a redundant variable. The "Correct code" removes this dependency, but keeps the redundant variable. That's backwards. If we simply remove the redundant variable, it's impossible to make that caching mistake.

But if enums are known singletons we should not raise afaiu.

To be precise, the members of an enum are singletons, so you can have multiple instances of class E(Enum), but at most len(E).

@Pierre-Sassoulas
Copy link
Member

@RumovZ would this be a better example to illustrate the issue ?

bad:

import functools


class Fibonnaci:

    result = []
    @functools.lru_cache(maxsize=None)  # [method-cache-max-size-none]
    def fibonacci(self, n):
        if n in {0, 1}:
            self.result.append(n)
        self.result.append(self.fibonacci(n - 1) + self.fibonacci(n - 2))

good:

import functools


@functools.cache
def cached_fibonacci(n):
    if n in {0, 1}:
        return n
    return cached_fibonacci(n - 1) + cached_fibonacci(n - 2)


class Fibonnaci:
    result = []
    def fibonacci(self, n):
        self.result.append(cached_fibonacci(n))

@RumovZ
Copy link
Author

RumovZ commented Nov 28, 2022

I think it's better, but result should be an instance variable, or self would still be redundant.
And of course, it raises the question why it should be treated differently than this equally bad code:

import functools

@functools.lru_cache(maxsize=None)  # [method-cache-max-size-none]
def fibonacci(result, n):
    if n in {0, 1}:
        result.append(n)
    result.append(fibonacci(result, n - 1) + fibonacci(result, n - 2))

This really shows that the problem is not self, but the dependency of the cache on a variable that is irrelevant to the actual computation.

Maybe I could summarise my point of view like this: As this lint has both false positives and negatives, it probably warns against the wrong thing.

@DanielNoord
Copy link
Collaborator

I think we are starting to confuse things here. In your original post you pointed out an issue with this warning, which we should solve:
No longer emit this warning for Enums as they self is always one of their values which are Singletons and thus aren't affected by the issue this warning targets.

The rest of this discussion is taking this warning much broader. As far as I see it there is no intent of expanding this warning to also start checking other functions for redundant variables as that is simply too difficult. It just invites you to rewrite methods to no longer use self as that can unnecessarily prevent your instances from getting garbage collected. The example in the documentation shows how this could be achieved.
I think scoping this issue to methods decreases the complexity and, while this issue shows it is certainly not perfect, makes it much more manageable and less prone to false positives.

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue Nov 28, 2022
@Pierre-Sassoulas Pierre-Sassoulas added False Positive 🦟 A message is emitted but nothing is wrong with the code Needs PR This issue is accepted, sufficiently specified and now needs an implementation labels Nov 28, 2022
@Pierre-Sassoulas
Copy link
Member

Thank you for clarifying @RumovZ . I created #7861 to make the documentation better feel free to review :)

As @DanielNoord said, the remaining issue is to exclude known builtin singleton from the check (just enum ?).

@clavedeluna clavedeluna removed the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Dec 6, 2022
clavedeluna added a commit to clavedeluna/pylint that referenced this issue Dec 7, 2022
@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 2.15.8, 2.15.9 Dec 7, 2022
clavedeluna added a commit to clavedeluna/pylint that referenced this issue Dec 9, 2022
Pierre-Sassoulas added a commit that referenced this issue Dec 9, 2022
For methods inheriting from ``Enum``.

Closes #7857

Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>
@Pierre-Sassoulas Pierre-Sassoulas removed this from the 2.15.9 milestone Dec 9, 2022
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.16.0 milestone Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
False Positive 🦟 A message is emitted but nothing is wrong with the code Needs PR This issue is accepted, sufficiently specified and now needs an implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants