Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periodic timout of 60-110 seconds when starting opsworks-agent #6

Open
muddman opened this issue Oct 16, 2015 · 3 comments
Open

periodic timout of 60-110 seconds when starting opsworks-agent #6

muddman opened this issue Oct 16, 2015 · 3 comments

Comments

@muddman
Copy link

muddman commented Oct 16, 2015

Thanks so much for making opsworks-vm available -- we love it!!

This probably isn't a bug with opsworks-vm but it is behavior I only see when using opsworks-vm so I thought I'd post it. Feel free to close it if you don't think it is appropriate to open here.

Most of the time during a vagrant up/provision we have a delay of 60-110 seconds right as the opsworks-agent is starting. The log shows 6-11 "sleep 10 sec" messages (see below). We have tracked it down to the /opt/aws/opsworks/current/bin/opsworks-agent-updater process which is kicked off at the beginning of each run. If we have a vagrant ssh up and manually kill the opsworks-agent-updater process then the run is not delayed.

Has anyone seen this or know what we can do to keep it from happening?

Here is the log with the sleeps that happen during a vagrant up/provision:

default: Running: inline script
==> default: Checking dependencies...
==> default: Reading input...
==> default: Processing 'recipes.json'...
==> default: Processing 'ops/opsworks/[REDACTED].json'...
==> default: Processing 'ops/dna/[REDACTED].json'...
==> default: Parsing deployments...
==> default: Parsing custom cookbooks...
==> default: Purging local cookbooks cache from '/opt/aws/opsworks/current/site-cookbooks'...
==> default: Generating dna.json...
==> default: Running opsworks agent...
==> default: [2015-10-16 21:54:33]  INFO [opsworks-agent(7456)]: About to run custom custom command from: /tmp/vagrant-opsworks20151016-7438-3rkgmo/dna.json
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: [Fri Oct 16 21:56:33 2015] About to execute: RUBYOPT="-E utf-8" /opt/aws/opsworks/current/bin/chef-client -j /var/lib/aws/opsworks/chef/2015-10-16-21-54-33-01.json -c /var/lib/aws/opsworks/client.stage1.rb -o opsworks_custom_cookbooks::load,opsworks_custom_cookbooks::execute 2>&1
==> default: [2015-10-16T21:56:34+00:00] INFO: Starting chef-zero on port 8889 with repository at repository at /opt/aws/opsworks/current
==> default:   One version per cookbook
==> default:   data_bags at /var/lib/aws/opsworks/data/data_bags
==> default:   nodes at /var/lib/aws/opsworks/data/nodes
==> default: [2015-10-16T21:56:34+00:00] INFO: Forking chef instance to converge...
==> default: [2015-10-16T21:56:34+00:00] INFO: *** Chef 11.10.4 ***
==> default: [2015-10-16T21:56:34+00:00] INFO: Chef-client pid: 7674

Here are the processes we see running if we have a vagrant ssh up during the run:

root      6419  6418  0 22:09 ?        00:00:00 /bin/sh -c cd /opt/aws/opsworks/current && bin/lockrun --verbose --lockfile=/var/lib/aws/opsworks/lockrun.lock -- /opt/aws/opsworks/current/bin/opsworks-agent-updater
root      6421  6419  0 22:09 ?        00:00:00 bin/lockrun --verbose --lockfile /var/lib/aws/opsworks/lockrun.lock -- /opt/aws/opsworks/current/bin/opsworks-agent-updater
root      6422  6421  7 22:09 ?        00:00:00 /opt/aws/opsworks/local/bin/ruby /opt/aws/opsworks/current/bin/opsworks-agent-updater

If we manually ssh in and kill -9 6419 then everything continues without delay.

@muddman
Copy link
Author

muddman commented Oct 20, 2015

This is now happening for us on every vagrant up and every vagrant deploy. It is very possible that this problem is localized to our environment but after a vagrant up it appears that the agent is kicking off every 5 minutes inside the vm and also on every vagrant provision which locks the run for up to 2 minutes.

We have an extremely ugly temporary fix for anyone else who is experiencing the same issue and we will post if we come up with a real solution to the problem.

ops/kill-opsworks-agent.sh

until ps auxww | grep [o]psworks-agent-updater
  do sleep 5
done
ps auxww | grep [o]psworks-agent-updater | awk '{print $2}' | sudo xargs kill -9 &

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu1404-opsworks"
  config.vm.provision :shell, path: "ops/kill-opsworks-agent.sh"
  config.vm.provision "opsworks", type:"shell", args:"path/to/dna.json"
end

@brutuscat
Copy link

We had to fix the timeout issues like this https://github.com/Vreasy/opsworks-vm/pull/2

@felixSchl
Copy link

I am experiencing the same, consistently. The hack provided by @muddman seems to do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants