Profiling

Available Tools

gem detailed_benchmarks
gem memory_profiler
gem mini-profiler
gem ruby-prof
gem pghero

Tool Feature Matrix

Tool	Production?	Method	Notes
detailed_benchmarks	No		Requires some setup to use for production-like conditions
memory_profiler
mini-profiler	Yes
ruby-prof
pghero	Yes	SQL Query profiler	Requires postgres superuser access

Derailed Benchmarks Gem

https://github.com/schneems/derailed_benchmarks

To see gem memory usage

bundle exec derailed bundle:mem

Runtime memory usage for the app

bundle exec derailed bundle:objects

Convert to MB. Example crowdAI's gems use 105.59MB for gems alone.

https://github.com/presidentbeef/brakeman

Do I have a memory leak?

Using the derailed_benchmarks gem, run increasing numbers of tests. If the memory usage is not stable there is a leak.

TEST_COUNT=5000 bundle exec derailed exec perf:mem_over_time
TEST_COUNT=10_000 bundle exec derailed exec perf:mem_over_time
TEST_COUNT=20_000 bundle exec derailed exec perf:mem_over_time

Puma

Puma is a multithreaded server and MRI is single threaded. Puma will work best with Rubinius or JRuby.

There is a gem configuring a default plugin, but the contents can be easily added to the initializer https://github.com/puma/puma-heroku/blob/master/lib/puma/plugin/heroku.rb

Because MRI doesn't have "real threads", ideally there would be at least one worker for each CPU core. Heroku 1x, 2x, and Performance-M dynos each have 8 cores. Performance-L dynos have 2 cores.

It's often not possible to have a worker per core, because of memory constraints. For example, a medium to large rails app on a Heroku 2x dyno will take up 300-550 megs, which allows for only running 1-3 workers.

Puma Workers

Text from config/puma.rb

Specifies the number of workers to boot in clustered mode. Workers are forked webserver processes. If using threads and workers together the concurrency of the application would be max threads * workers. Workers do not work on JRuby or Windows (both of which do not support processes).

Puma Default Settings

        :min_threads => 0,
        :max_threads => 16,
        :log_requests => false,
        :debug => false,
        :binds => ["tcp://#{DefaultTCPHost}:#{DefaultTCPPort}"],
        :workers => 0,
        :daemon => false,
        :mode => :http,
        :worker_timeout => DefaultWorkerTimeout,
        :worker_boot_timeout => DefaultWorkerTimeout,
        :worker_shutdown_timeout => DefaultWorkerShutdownTimeout,
        :remote_address => :socket,
        :tag => method(:infer_tag),
        :environment => lambda { ENV['RACK_ENV'] || "development" },
        :rackup => DefaultRackup,
        :logger => STDOUT,
        :persistent_timeout => Const::PERSISTENT_TIMEOUT

https://github.com/puma/puma/blob/master/lib/puma/configuration.rb#L173-L193

Code Profiling with ruby-prof

https://github.com/ruby-prof/ruby-prof

Heroku Dynos

https://devcenter.heroku.com/articles/dyno-types

The issue with Puma is that for example a 1X Dyno has 512MB of RAM and 8 cores, so depending on the size of the Rails app only 1-2 Puma workers can be run per Dyno, despite the 8 available cores.

Dyno Cores

The number of cores available on a Heroku Dyno is no longer published and subject to change. It is possible to find the current number of cores using nproc for your configuration.

$ heroku run bash --app crowdai-prd
Running bash on ⬢ crowdai-prd... up, run.9093 (Standard-1X)
~ $ nproc
8

https://devcenter.heroku.com/articles/dyno-types

Some references

https://github.com/puma/puma-heroku

http://julianee.com/rails-sidekiq-and-heroku/

http://stackoverflow.com/questions/8821864/config-assets-compile-true-in-rails-production-why-not

https://github.com/puma/puma-heroku/issues/4

https://devcenter.heroku.com/articles/scaling.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly