Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Further) Improve Bundler::Settings#[] performance and memory usage #6923

Conversation

technicalpickles
Copy link
Contributor

@technicalpickles technicalpickles commented Aug 29, 2023

What was the end-user or developer problem that led to this PR?

It's ya boi technicalpickles running memory_profiler on a large monolith booting Rails again

What is your fix for the problem, implemented in this PR?

I previously identified and improved this method over in #6884 but while reviewing another memory_profiler profile, I realized another gain we can eek out.

This method keeps comes up in part because configs is allocating a new Hash every time. My last change took advantage of that by using map! on it. configs is called quite often, including in this [] method, so there's a benefit to memoizing it.

Back in [], logically we are trying to find the first Hash in configs that has a value for the given key. Currently, we end up map and compact to just get that value.

Instead, we can use a loop over configs, and break when we find the value for the key.

Benchmarks

Microbenchmarks, comparing implementations (this PR implements manual_loop w/ memoized configs):

Calculating -------------------------------------
            original   800.000  memsize (   168.000  retained)
                        11.000  objects (     1.000  retained)
                         3.000  strings (     0.000  retained)
         bang_method   392.000  memsize (    72.000  retained)
                         7.000  objects (     1.000  retained)
                         3.000  strings (     1.000  retained)
         with_detect   392.000  memsize (    72.000  retained)
                         7.000  objects (     1.000  retained)
                         3.000  strings (     1.000  retained)
with_detect w/ memoized configs
                       392.000  memsize (    72.000  retained)
                         7.000  objects (     1.000  retained)
                         3.000  strings (     1.000  retained)
         manual_loop   312.000  memsize (    72.000  retained)
                         6.000  objects (     1.000  retained)
                         3.000  strings (     1.000  retained)
manual_loop w/ memoized configs
                       312.000  memsize (    72.000  retained)
                         6.000  objects (     1.000  retained)
                         3.000  strings (     1.000  retained)

Comparison:
         manual_loop:        312 allocated
manual_loop w/ memoized configs:        312 allocated - same
         bang_method:        392 allocated - 1.26x more
         with_detect:        392 allocated - 1.26x more
with_detect w/ memoized configs:        392 allocated - 1.26x more
            original:        800 allocated - 2.56x more
Warming up --------------------------------------
            original    74.589k i/100ms
         bang_method   104.215k i/100ms
         with_detect    99.974k i/100ms
with_detect w/ memoized configs
                       100.748k i/100ms
         manual_loop   104.471k i/100ms
manual_loop w/ memoized configs
                       105.599k i/100ms
Calculating -------------------------------------
            original    753.388k (± 1.7%) i/s -      3.804M in   5.050732s
         bang_method      1.038M (± 1.4%) i/s -      5.211M in   5.022015s
         with_detect      1.003M (± 0.9%) i/s -      5.099M in   5.084331s
with_detect w/ memoized configs
                          1.004M (± 1.0%) i/s -      5.037M in   5.018705s
         manual_loop      1.042M (± 0.8%) i/s -      5.224M in   5.011204s
manual_loop w/ memoized configs
                          1.056M (± 1.2%) i/s -      5.386M in   5.100790s

Comparison:
            original:   753387.7 i/s
manual_loop w/ memoized configs:  1055974.9 i/s - 1.40x  faster
         manual_loop:  1042434.4 i/s - 1.38x  faster
         bang_method:  1037808.1 i/s - 1.38x  faster
with_detect w/ memoized configs:  1003831.7 i/s - 1.33x  faster
         with_detect:  1002897.1 i/s - 1.33x  faster

Hyperfine running against bundler 2.4.19 vs this branch

❯ hyperfine "ruby boot-time-benchmark.rb" "ruby -I/Users/josh.nichols/workspace/rubygems/bundler/lib boot-time-benchmark.rb"
Benchmark 1: ruby boot-time-benchmark.rb
  Time (mean ± σ):      8.902 s ±  0.807 s    [User: 5.142 s, System: 3.117 s]
  Range (min … max):    8.090 s … 10.858 s    10 runs

Benchmark 2: ruby -I/Users/josh.nichols/workspace/rubygems/bundler/lib boot-time-benchmark.rb
  Time (mean ± σ):      8.350 s ±  0.101 s    [User: 5.155 s, System: 3.045 s]
  Range (min … max):    8.233 s …  8.513 s    10 runs

Summary
  ruby -I/Users/josh.nichols/workspace/rubygems/bundler/lib boot-time-benchmark.rb ran
    1.07 ± 0.10 times faster than ruby boot-time-benchmark.rb

Colorful version:

CleanShot 2023-08-28 at 20 55 08@2x

Make sure the following tasks are checked

I previously identified and improved this method over in rubygems#6884
but while reviewing another memory_profiler profile, I realized another
gain we can eek out.

This method keeps comes up in part because `configs` is allocating a new
Hash every time. My last change took advantage of that by using `map!`
on it. `configs` is called quite often, including in this `[]` method,
so there's a benefit to memoizing it.

Back in `[]`, logically we are trying to find the first Hash in `configs`
that has a value for the given key. Currently, we end up `map` and
`compact` to just get that value.

Instead, we can use a loop over `configs`, and break when we find the
value for the key.
@technicalpickles
Copy link
Contributor Author

I'm also including the results of memory_profiler on our app boot, focused on memory_profiler. The main data point to note is the difference in memory allocated in lib/bundler/settings.rb.

Before:
boot-time-bundler-master.txt

   2,332,050  /Users/josh.nichols/workspace/rubygems/bundler/lib/bundler/settings.rb

After:
boot-time-bundler-only-improvements-5.txt

   1,065,298  /Users/josh.nichols/workspace/rubygems/bundler/lib/bundler/settings.rb

That is a 54.3% reduction in allocated memory 😍

Copy link
Member

@martinemde martinemde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Btw why isn't there a "return result of first truthy block" on Enumerable. I often need it.

@martinemde
Copy link
Member

martinemde commented Aug 29, 2023

Based on the variety of test failures, the tests must need that memoized state cleared between runs.

@technicalpickles
Copy link
Contributor Author

Btw why isn't there a "return result of first truthy block" on Enumerable. I often need it.

Yep, it is detect/find. I included an implementation in my benchmark using it, but it wasn't faster than an each block.

tests must need that memoized state cleared between runs

Hmm, I will check it out. I think it's still an improvement w/o the memoization if it comes to it though.

@technicalpickles
Copy link
Contributor Author

technicalpickles commented Aug 29, 2023

I tried reverting the memoization, and these are still failing, so it's something else. For reference, these are the failing ones:

 bin/rspec ./spec/commands/cache_spec.rb:200 ./spec/commands/cache_spec.rb:188 ./spec/commands/cache_spec.rb:371  ./spec/plugins/install_spec.rb:25 ./spec/plugins/install_spec.rb:283

I personally find some of the failures tough to parse in general especially when they are spawning new processes to do the work. If there are any protips, would love to hear them 😁

I think spec/bundler/settings_spec.rb needs some more use cases to help catch what is going on.

@martinemde
Copy link
Member

Yep, it is detect/find.

I'm wishing for map_find {|(v)| result_of_this_is_returned(v) }.

Anyway, I tried the code and I get the same errors. Then I tried putting both old and new in the same file. The keys it fails on are set to false! I looked at this many times and didn't see the difference between them, but of course the difference is that compact! doesn't throw out false values, but this PR does.

@indirect
Copy link
Member

Does flat_map plus break or return get you the behavior you're looking for?

Co-authored-by: Martin Emde <martinemde@users.noreply.github.com>
@@ -316,7 +318,7 @@ def key_for(key)
private

def configs
{
@configs ||= {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there ever a time when we would want to recompute this ivar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, what if this became an each_config method, yielding the key, value pairs? then we dont need the hash at all, if all we ever do is iterate over it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there ever a time when we would want to recompute this ivar?

I gave this some thought initally, but forgot to note it. I don't think so though.

My initial thought that was that it wouldn't need to reset if the contents of the ivars in it change (ie @temporary and all). On closer review, the contents of the ivars could change (ie @temporary = {}), but I don't see any writer methods or any code that assigns directly to them.

alternatively, what if this became an each_config method, yielding the key, value pairs? then we dont need the hash at all, if all we ever do is iterate over it

Oh, that is an interesting thought! I tested it, and yeah, it saves allocating a hash and is faster 😮 Updated my benchmark with it https://gist.github.com/technicalpickles/91933d34dcd47bfc99d6ac0e53fcdb1e (as correct manual_loop w/ each_config)

@martinemde martinemde merged commit 54032f3 into rubygems:master Aug 30, 2023
92 checks passed
@technicalpickles
Copy link
Contributor Author

Does flat_map plus break or return get you the behavior you're looking for?

@indirect oh interesting, I hadn't considered using a break in the middle of a map 🤔 TIL you can break with a value, and it's interesting that is what ends up returned rather than the mapped thing.

I added this implementation to the benchmark:

    key = key_for(name)
    value = configs.flat_map {|_, config| break config[key] unless config[key].nil? }

    converted_value(value, name)

Unfortunately, it uses more memory and is slow though 😞 See 'with_flat_map' on https://gist.github.com/technicalpickles/91933d34dcd47bfc99d6ac0e53fcdb1e

@technicalpickles technicalpickles deleted the bundler-settings-bracket-operator-performance branch August 30, 2023 12:42
deivid-rodriguez pushed a commit that referenced this pull request Sep 21, 2023
…et-operator-performance

(Further) Improve Bundler::Settings#[] performance and memory usage

(cherry picked from commit 54032f3)
deivid-rodriguez pushed a commit that referenced this pull request Sep 21, 2023
…et-operator-performance

(Further) Improve Bundler::Settings#[] performance and memory usage

(cherry picked from commit 54032f3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants