Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set startup time with env NATS_STARTUP_DELAY #3743

Merged
merged 3 commits into from Jan 3, 2023
Merged

Set startup time with env NATS_STARTUP_DELAY #3743

merged 3 commits into from Jan 3, 2023

Conversation

Alberic-Hardis
Copy link
Contributor

@Alberic-Hardis Alberic-Hardis commented Dec 26, 2022

The startup delay when starting as a service under windows was set to 10 seconds.

This value was fixed and hardcoded since before JetStream. On (some) Windows systems, this leads to service startup failures, as the store dir sorting may be hindered by an important load, or slowed down by increased accesses times, typically from security software influence.

This commit allows overriding the startup delay using an environment variable NATS_STARTUP_DELAY. This variable is checked outside init() to allow reporting into the service log.

Any value can be used (even lower than the initial 10s). Bare NATS start can be far faster than 10s. Ops may want to setup this env var to lower values in order to trigger an error on stricter constraints.

  • Link to issue, e.g. Resolves #NNN
  • Documentation added (if applicable)
  • Tests added
  • Branch rebased on top of current main (git pull --rebase origin main)
  • Changes squashed to a single commit (described here)
  • Build is green in Travis CI
  • You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

Resolves #3742

Changes proposed in this pull request:

  • make max startup delay of server a variable
  • set this duration from NATS_STARTUP_DELAY environment variable ("30s" for instance)
  • in case of failure to parse or failure to retrieve value, default to 10 seconds
  • log parsing errors of variable into system log with the winservice instance (for alerting)

/cc @nats-io/core

Note : I could not add tests for this change, setting the value of the env var to low delays shows the change appears to have the expected behavior.

The startup delay when starting as a service under windows was set to 10
seconds.

This value was fixed and hardcoded since before JetStream. On (some)
Windows systems, this leads to service startup failures, as the store
dir sorting may be hindered by an important load, or slowed down by
increased accesses times, typically from security software influence.

This commit allows overriding the startup delay using an environment
variable NATS_STARTUP_DELAY. This variable is checked outside init() to
allow reporting into the service log.

Any value can be used (even lower than the initial 10s). Bare NATS
start can be far faster than 10s. Ops may want to setup this env var to
lower values in order to trigger an error on stricter constraints.
@Alberic-Hardis
Copy link
Contributor Author

Travis build fails, but it cannot be caused by the change. The changes all happen in a windows-only file, and all tests are run under linux focal.

Copy link
Member

@ColinSullivan1 ColinSullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@ColinSullivan1 ColinSullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me. Could you add a test? Perhaps check the time on startup it would take to accept a connection.

@Alberic-Hardis
Copy link
Contributor Author

Thanks! Looks good to me. Could you add a test? Perhaps check the time on startup it would take to accept a connection.

@ColinSullivan1 I don't know if it's actually a good idea

  • the travis runner does not run tests on windows (this would be useful only for coders on windows)
  • this is the implementation of service wrapping for windows, only used when nats-server is installed as a service

I think we could make winServiceOption func(winServiceWrapper), that would both make it future ready (some other behavior might become configurable later on), and allow checking the option application onto the winService struct.

Would that be OK for you ?

@Jarema
Copy link
Member

Jarema commented Dec 27, 2022

It would be good to update the docs too.

@Alberic-Hardis
Copy link
Contributor Author

It would be good to update the docs too.
@Jarema here you are
nats-io/nats.docs#550

@Alberic-Hardis
Copy link
Contributor Author

Update :

I tried to write some tests for this, but either I fail to reach the winServiceWrapper.Execute() or I just end up writing a test that tests os.LookupEnv(), basically.

Other option is to mock the server to test winserviceWrapper.Execute, but that implies redefining winServiceWrapper with an interface instead of simply a *Server, to make the server mockable. This implies too many changes too unrelated with the issue of #3742.

I'll give it another try if I have a better inspiration. This definitely makes me realize why no tests were written for this piece of code previously.

@Alberic-Hardis
Copy link
Contributor Author

Happy year 2023 everyone.
I've copied the logic of the tests in service_test.go to write a windows-specific test (builds only on windows)

Travis still fails for some unrelated reason (no code in the proposed changes can run in Travis test suite that runs on Linux)

The tests I propose here use a crude but sufficient windows services mock, and target the delay env var by setting it at the beginning of the test in order to make the startup fail (winServiceWrapper.Execute returns false) in the short timeframe of the tests.

@ColinSullivan1
Copy link
Member

@Alberic-Hardis Thanks! We've run the test on windows (thanks @scottf). LGTM!

@derekcollison derekcollison merged commit 00c98a0 into nats-io:dev Jan 3, 2023
@Alberic-Hardis
Copy link
Contributor Author

Hello all,
Is this merge likely to make its way into main and an official release ?

@derekcollison
Copy link
Member

Will be part of 2.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants