Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monit timeout needs to be updated to align with configurable timeout added in 2.6.0 for wait_for_uaa script #56

Open
3 tasks
fbehrens51 opened this issue Jun 11, 2020 · 4 comments · Fixed by #58

Comments

@fbehrens51
Copy link

What version of the credhub server you are using?
2.6.0

What version of the credhub cli you are using?
2.7.0

If you were attempting to accomplish a task, what was it you were attempting to do?
Trying to use the credhub-collacted.yml operation that is part of the concourse-bosh-deployment project (currently working in a fork/locally) to deploy credhub.

bosh deploy -n -d concourse \
     ~/workspace/concourse-bosh-deployment/cluster/concourse.yml \
  -l vars-file.yml \
  -l ~/workspace/concourse-bosh-deployment/versions.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/basic-auth.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/enable-lets-encrypt.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/github-auth.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/privileged-http.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/privileged-https.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/scale.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/web-network-extension.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/worker-ephemeral-disk.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/uaa.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/secure-internal-postgres.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/secure-internal-postgres-uaa.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/credhub-colocated.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/secure-internal-postgres-credhub.yml \
  -o ~/workspace/concourse-bosh-deployment/cluster/operations/credhub-custom-uaa-wait.yml

where credhub-custom-uaa-wait.yml is:

- type: replace
  path: /instance_groups/name=web/jobs/name=credhub?/properties/credhub/authentication/uaa/wait_for_start_max_timeout?
  value: ((wait_for_start_max_timeout))

- type: replace
  path: /instance_groups/name=web/jobs/name=credhub?/properties/credhub/authentication/uaa/wait_for_start_connect_timeout?
  value: ((wait_for_start_connect_timeout))

and my latest versions.yml is:

# this file is partially maintained by CI; the concourse and garden-runc
# versions and sha1s are automatically bumped, while the rest are preserved
# as-is.
#
# this should make getting started easy while being easy enough to maintain
# manually. feel free to PR sane defaults along with newly supported
# infrastructures and such!
---
concourse_version: '6.2.0'
concourse_sha1: '3c59cac5d5faae5f058fafaa1b501c34b084adba'
bpm_version: '1.1.8'
bpm_sha1: 'c956394fce7e74f741e4ae8c256b480904ad5942'
postgres_version: '41'
postgres_sha1: '4488d08ff54117a9d904f6e2f27c82c1cf4c910e'
windows_utilities_version: '0.11.0'
windows_utilities_sha1: 'efc10ac0f4acae23637ce2c6f864d20df2e3a781'
bbr_sdk_version: '1.15.0'
bbr_sdk_sha1: 'b2d8584dd2ed964c4849cb6d7b536e6cea3e6e8d'
uaa_version: '74.20.0'
uaa_sha1: '0909c912ff4541f4388a0534e5b3b8e3688dc60f'
credhub_version: '2.6.0'
credhub_sha1: 'c45af16ed393bb3cf061b8749e3ee4cae74ce995'

What did you expect to happen?
For the credhub job on the web instance to start/deploy successfully

What was the actual behavior?
in v2.5.7 (and 2.5.11) credhub job fails most of the time on a deploy or when trying a bosh recreate of the web instance because it takes too long for uaa to start. In those versions, the timeout in the wait_for_uaa is hard coded to 5 seconds.
I saw the timeouts in the wait_for_uaa were parameterized in 2.6.0 so I switched to it and set the wait_for_start_max_timeout property set to 120 (current default is 300). The wait_for_uaa is now successful, but the credhub job fails because monit timeout is defaulted to 30 seconds and so it retries to start credhub subsequent times and fails with port (8844) already in use.

Locally, I hacked together a customized version which appears to fix the issue. I used the wait_for_start_max_timeout value to set a timeout value on the start call in the monit file, but could be a new property instead.

<% if p('bpm.enabled') %>
check process credhub
  with pidfile /var/vcap/sys/run/bpm/credhub/credhub.pid
  start program "/var/vcap/jobs/bpm/bin/bpm start credhub" with timeout <%= p("credhub.authentication.uaa.wait_for_start_max_timeout") %> seconds
  stop program "/var/vcap/jobs/bpm/bin/bpm stop credhub"
  group vcap
<% else %>
check process credhub
  with pidfile /var/vcap/sys/run/credhub/pid
  start program "/var/vcap/jobs/credhub/bin/ctl start" with timeout <%= p("credhub.authentication.uaa.wait_for_start_max_timeout") %> seconds
  stop program "/var/vcap/jobs/credhub/bin/ctl stop"
  group vcap
<% end %>

Please confirm where necessary:

  • I have included a log output
  • My log includes an error message
  • I have included steps for reproduction

If you are a PCF customer with an Operation Manager (PCF Ops Manager) please direct your questions to support (https://support.pivotal.io/)

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173296784

The labels on this github issue will be updated when the story is started.

@sonmacharius
Copy link

Any resolution for this issue? on release version 2.9.0 and this same timeout problem is still an issue.

@swalchemist
Copy link
Contributor

The fix had CI failures - reverted, so reopening the issue.

@swalchemist swalchemist reopened this Nov 11, 2022
@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants