Skip to content

Profiling

Sean edited this page Mar 6, 2017 · 18 revisions

Available Tools

  • gem detailed_benchmarks
  • gem memory_profiler
  • gem mini-profiler
  • gem ruby-prof
  • gem pghero

Tool Feature Matrix

Tool Production? Method Notes
detailed_benchmarks No Requires some setup to use for production-like conditions
memory_profiler
mini-profiler Yes
ruby-prof
pghero Yes SQL Query profiler Requires postgres superuser access

Derailed Benchmarks Gem

https://github.com/schneems/derailed_benchmarks

To see gem memory usage

bundle exec derailed bundle:mem

Runtime memory usage for the app

bundle exec derailed bundle:objects

Convert to MB. Example crowdAI's gems use 105.59MB for gems alone.

https://github.com/presidentbeef/brakeman

Do I have a memory leak?

Using the derailed_benchmarks gem, run increasing numbers of tests. If the memory usage is not stable there is a leak.

TEST_COUNT=5000 bundle exec derailed exec perf:mem_over_time
TEST_COUNT=10_000 bundle exec derailed exec perf:mem_over_time
TEST_COUNT=20_000 bundle exec derailed exec perf:mem_over_time

Puma

Puma is a multithreaded server and MRI is single threaded. Puma will work best with Rubinius or JRuby.

Because MRI doesn't have "real threads", ideally there would be at least one worker for each CPU core. Heroku 1x, 2x, and Performance-M dynos each have 8 cores. Performance-L dynos have 2 cores.

It's often not possible to have a worker per core, because of memory constraints. For example, a medium to large rails app on a Heroku 2x dyno will take up 300-550 megs, which allows for only running 1-3 workers.

Puma Workers

Text from config/puma.rb

Specifies the number of workers to boot in clustered mode. Workers are forked webserver processes. If using threads and workers together the concurrency of the application would be max threads * workers. Workers do not work on JRuby or Windows (both of which do not support processes).

Puma Default Settings

        :min_threads => 0,
        :max_threads => 16,
        :log_requests => false,
        :debug => false,
        :binds => ["tcp://#{DefaultTCPHost}:#{DefaultTCPPort}"],
        :workers => 0,
        :daemon => false,
        :mode => :http,
        :worker_timeout => DefaultWorkerTimeout,
        :worker_boot_timeout => DefaultWorkerTimeout,
        :worker_shutdown_timeout => DefaultWorkerShutdownTimeout,
        :remote_address => :socket,
        :tag => method(:infer_tag),
        :environment => lambda { ENV['RACK_ENV'] || "development" },
        :rackup => DefaultRackup,
        :logger => STDOUT,
        :persistent_timeout => Const::PERSISTENT_TIMEOUT

https://github.com/puma/puma/blob/master/lib/puma/configuration.rb#L173-L193

Code Profiling with ruby-prof

https://github.com/ruby-prof/ruby-prof

Heroku Dynos

https://devcenter.heroku.com/articles/dyno-types

The issue with Puma is that for example a 1X Dyno has 512MB of RAM and 8 cores, so depending on the size of the Rails app only 1-2 Puma workers can be run per Dyno, despite the 8 available cores.

Dyno Cores

The number of cores available on a Heroku Dyno is no longer published and subject to change. It is possible to find the current number of cores using nproc for your configuration.

$ heroku run bash --app crowdai-prd
Running bash on ⬢ crowdai-prd... up, run.9093 (Standard-1X)
~ $ nproc
8

https://devcenter.heroku.com/articles/dyno-types

Some references

https://github.com/puma/puma-heroku

http://julianee.com/rails-sidekiq-and-heroku/

http://stackoverflow.com/questions/8821864/config-assets-compile-true-in-rails-production-why-not

https://github.com/puma/puma-heroku/issues/4

https://devcenter.heroku.com/articles/scaling.com