Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Based health Checks #3430

Merged
merged 30 commits into from
May 16, 2024
Merged

Machine Based health Checks #3430

merged 30 commits into from
May 16, 2024

Conversation

billyb2
Copy link
Member

@billyb2 billyb2 commented Apr 5, 2024

We basically take the command provided in [[services.machine_checks]],
and run it. If we get an exit code that isn't 0, we say that we failed
the health check. This is only implemented for the rolling strategy
right now, and needs to be implemented for bluegreen, canary, etc.

Fixes #3233

Change Summary

What and Why:

How:

Related to:


Documentation

  • Fresh Produce
  • In superfly/docs, or asked for help from docs team
  • n/a

@billyb2 billyb2 changed the title Machine events todo Machine Based health Checks Apr 16, 2024
@billyb2 billyb2 marked this pull request as ready for review April 16, 2024 16:17
@billyb2 billyb2 marked this pull request as draft April 17, 2024 14:18
@billyb2 billyb2 force-pushed the machine_events branch 3 times, most recently from b6886d7 to b9e56b0 Compare April 26, 2024 16:14
@billyb2 billyb2 marked this pull request as ready for review April 26, 2024 16:14
@billyb2
Copy link
Member Author

billyb2 commented Apr 26, 2024

A huge caveat is that this doesn't support bluegreen deploys yet. It's quiet a bit more involved then i initially thought to get that support moving

@benbjohnson benbjohnson self-requested a review April 26, 2024 17:50
internal/command/deploy/machinebasedtest.go Outdated Show resolved Hide resolved
internal/command/deploy/machinebasedtest.go Outdated Show resolved Hide resolved
@billyb2 billyb2 force-pushed the machine_events branch 2 times, most recently from 4bac995 to a36f9f7 Compare May 6, 2024 17:14
We basically take the command provided in [[services.machine_checks]],
and run it. If we get an exit code that isn't 0, we say that we failed
the health check. This is only implemented for the rolling strategy
right now, and needs to be implemented for bluegreen, canary, etc.
We're also now parallelizing canary machine creation. This isn't worth
a fresh produce or anything since it was just a few lines of code, but
still neat
@billyb2
Copy link
Member Author

billyb2 commented May 13, 2024

^ rebased

Copy link
Member

@dangra dangra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see a test case added for this complex app case #3430 (comment)

It pretty much checks expectations around stop configuration, but in general it could be the perfect test case for all "machine check" related edge cases.

I'd add it as tomachine-machinechecks.toml, include multiple process groups, services and machine checks.

Copy link
Member

@dangra dangra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can keep throwing nitpicks all day but It is in much better shape billy! well done.

@billyb2 billyb2 merged commit 0a7df8d into master May 16, 2024
39 checks passed
@billyb2 billyb2 deleted the machine_events branch May 16, 2024 17:23
billyb2 added a commit to superfly/docs that referenced this pull request May 24, 2024
Please see superfly/flyctl#3430 for the
corresponding flyctl PR
billyb2 added a commit to superfly/docs that referenced this pull request May 28, 2024
* Add a section about machine_checks

Please see superfly/flyctl#3430 for the
corresponding flyctl PR

* Switch machine to Machine

Vale have mercy

* Add machine_checks to http_services
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Using Ephemeral Machines For Health Checks
3 participants