Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entering infinite loop during run #209

Open
litch opened this issue Nov 22, 2016 · 12 comments
Open

Entering infinite loop during run #209

litch opened this issue Nov 22, 2016 · 12 comments
Labels
Type: Bug Does not work as expected.

Comments

@litch
Copy link

litch commented Nov 22, 2016

Cookbook version

1.7.8 - and 3.0.0

Chef-client version

12.5.1

Platform Details

Ubuntu 14.04, running on Softlayer VPS

Scenario:

During the chef run, the node will just start into an infinite loop when it hits one of my runit_service configurations. I had run into this problem a bit earlier and seemed to get around it with a different version of runit, but it has re-emerged.

Steps to Reproduce:

This is the entire contents of the recipe in question:

health_check_port = 2702

package 'ruby2.3'

package 'obsidian-account-service' do
  action :upgrade
  notifies :restart, 'runit_service[account-service]'
end

package 'libpq-dev'

file "/opt/obsidian/account-service/settings/account.json" do
  content JSON.pretty_generate(
    'externalMessaging' => {
      'eventStore' => {
        'host' => node[:event_store_message_bus][:host],
        'port' => node[:event_store_message_bus][:http_port]
      }
    },

    'eventPublishing' => {
      'eventStore' => {
        'host' => node[:event_store_message_bus][:host],
        'port' => node[:event_store_message_bus][:http_port]
      }
    }
  )
end

file "/opt/obsidian/account-service/settings/account_client.json" do
  content JSON.pretty_generate(
    'eventStore' => {
      'host' => node[:event_store_message_bus][:host],
      'port' => node[:event_store_message_bus][:http_port]
    }
  )
end

file "/opt/obsidian/account-service/settings/error_telemetry_component_client.json" do
  content JSON.pretty_generate(
    'eventStore' => {
      'host' => node[:event_store_message_bus][:host],
      'port' => node[:event_store_message_bus][:http_port]
    }
  )
end

file "/opt/obsidian/account-service/settings/event_store_client_http.json" do
  content JSON.pretty_generate(
    'host' => node[:event_store][:host],
    'port' => node[:event_store][:http_port]
  )
end

file "/opt/obsidian/account-service/settings/read_model.json" do
  content JSON.pretty_generate(
    'postgresConnection' => {
      'database' => node[:read_model][:database],
      'host' => node[:read_model][:host],
      'password' => node[:read_model][:password],
      'username' => node[:read_model][:username]
    }
  )
end

file "/opt/obsidian/account-service/settings/health.json" do
  content JSON.pretty_generate(
    'port' => health_check_port
  )
end

runit_service "account-service" do
  default_logger true
end

obsidian_component_health_check "account-service" do
  port health_check_port
end

Expected Result:

Historically this has worked fine - we have used this strategy to deploy our services for over a year, but as the number of services has grown, we're starting to see this. This recipe should have just configured a runit_service to run.

Actual Result:

Logs are here: https://gist.github.com/litch/a4b1c7a7d4cbc57c58c0ee503811fa45

The resource just keeps invoking itself recursively it seems. =/

@litch
Copy link
Author

litch commented Nov 22, 2016

Oh, also I should note that this only happens on the first time each recipe is run on the machine. So in theory, to get this machine that is hosting 9 of the services to work, I could start the chef run and abort once it starts to recurse and then the machine would converge as expected on future runs. But that is clearly probematic.

@litch
Copy link
Author

litch commented Nov 22, 2016

It looks like reverting to runit 1.6.0 fixed this problem.

@tas50 tas50 added the Type: Bug Does not work as expected. label Jan 4, 2017
@robsonpeixoto
Copy link

robsonpeixoto commented Feb 17, 2017

news about this bug ?

@daxgames
Copy link

Anyone? We are having this issue also.

@daxgames
Copy link

@tas50 - ???

@daxgames
Copy link

@jtimberman - ??? Any ideas here?

@daxgames
Copy link

daxgames commented Aug 16, 2017

The only reason I need this if for Chef push-jobs, but I do need it. Having to manually run chef-client, manually interrupt it and run it again is unacceptable.

@jtimberman
Copy link
Contributor

Sorry, I'm not actively involved in maintaining this cookbook and don't have cycles to dig into this.

@tas50 @iennae @cheeseplus halp?

@daxgames
Copy link

@jtimberman thanks for the resp. I set my push-jobs wrapper cookbook dependencies to runit = 1.6.0 based on an earlier comment. Have not yet verified if it works. Not sure what I'm losing by doing this.

@tas50
Copy link
Contributor

tas50 commented Dec 3, 2017

I'm currently working on the push jobs cookbook to clean up some of the old recipes and create new resources for managing things. I'll carve out some time to make sure the runit logic works. I would highly recommend on Ubuntu 14.04 that you use Upstart instead. It's far more reliable and simpler to setup.

@kitchen
Copy link

kitchen commented Aug 6, 2018

I'm running into this same problem and am having a lot of trouble building an isolated test case. Even directly copy/pasting my code from the place where I'm having the problem to a separate chef environment and running them under test kitchen it doesn't seem to be reproducable :(

I will say that at least in my problem code, commenting out all of the notifies that notify restart_service and restart_log_service stops the loop, but I'm not sure why not commenting them causes the loop. Looking at the output it doesn't show any changes being made in the subsequent runs through the restart_service, create, and enable bits, yet it just keeps restarting over and over again.

Frustrating.

@balasankarc
Copy link

I hit a similar situation with v4.3.0, with creating config file related to logs. In my case, the file gets created in a directory on an NFS mount with root squash enabled. So, Chef is unable to change the owner/group of the file. From my understanding, this happens

  1. Creation of config happens in create action of the resource
  2. During this, it notifies a restart of the service using a ruby block.
  3. This ruby block calls the enable action of runit_service.
  4. The first thing the enable action does is calling create action of the custom resource, thus forming a cycle.
  5. In normal scenarios, this create action will not call enable again, because the config file resource is already up to date, and no restart of the service happens, thus breaking the loop.
  6. But in this scenario, the config file resource is never up to date, because it's ownership is never the intended one, thus the two actions keeps on calling each other, thus forming an infinite loop.

Any specific reason the functionality is split between create and enable, and why can't all of it be just under enable ? At least that will prevent enable and create calling each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Does not work as expected.
Projects
None yet
Development

No branches or pull requests

7 participants