-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache _has_init calls to avoid repeated stats #2429
Cache _has_init calls to avoid repeated stats #2429
Conversation
_has_init can end up checking for the presence of the same files over and over. For example, when running pylint's import-error checks on a codebase like yt-dlp, ~43,000 redundant stats were performed prior to caching. Closes pylint-dev/pylint#9613.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2429 +/- ##
=======================================
Coverage 92.78% 92.78%
=======================================
Files 94 94
Lines 11098 11099 +1
=======================================
+ Hits 10297 10298 +1
Misses 801 801
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Wouldn't it be better to cache |
Hmm, I'm not sure :). These are the stats for the two functions with this PR applied:
I think my preference is to use caching with smaller function blocks because it makes it easier to identify side effects or statefulness which could make caching unsafe. Would it be okay to revisit this in future tickets / PRs if |
I'm not sure what other maintainers think of this. Personally I'm not a big fan of slapping |
Disclaimer: I'm not a cache expert, I don't know what the usual issues with caches are so I'm not sure what's wrong with cache, and why we need to reason about it ? We had some issues with caching (memory leaks) in the past but now we're using That being said can't we just |
We can of course still have caches in which the objects get too large, or caches that never hit. Every cache adds additional "maintainer" overhead as you need to reason about whether it is worth it. That's why I would prefer to put them a little higher up to have more effect and reduce their overall number.
Yes of course this is all true. I just pointed out that since this is only called by 2 functions and is an internal function it might be make more sense to look a little higher up. |
I see thank you for explaining. While it's true, I agree with @correctmost point that caching small functions is simpler to reason about than caching big functions. I'm not sure that the overall maintainance burden would be better with 2 big vs 4/5 smalls. Also as they said:
Overall, I'm neutral on the issue but I think it shouldn't block the amazing work done by @correctmost atm. There's a lot of provided benchmarks so we can clearly see that this caching is required -- for the moment at least. This could have gone in 3.2.0 if I didn't wait for answering 😅
And this one too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the amazing benchmarks @correctmost
Sliding in only for this tangent:
Looks like we made the right change here, but I think healthy cache skepticism is always worth saluting. 🫡 Someone who I never worked with but who left an influence at a place I used to work at shared this custom slack alert he wrote to notify on mentions of "architecturally significant events" like "trivial", "should work", "cache", and "redis". 🤣 |
Description
_has_init
can end up checking for the presence of the same files over and over.For example, when running pylint's import-error checks on a codebase like yt-dlp, ~43,000 redundant stats were performed prior to caching.
Closes pylint-dev/pylint#9613.
Performance
I ran pylint with just the
import-error
checks on the yt-dlp codebase. With 2c38c02 andpylint-dev/pylint@7521eb1, linting takes ~34.1 seconds. With the fix applied, linting takes ~33.8 seconds.
Type of Changes