-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching for health checks #157
Comments
Did you consider that within the check itself, as opposed to the spec defining it? |
I'm not sure if I understood your question. Are you implying to implement this behavior in the check itself? Isn't that the point to have a framework to be released from implementing commonly used things? |
That's what I meant, yes. My thought here is that I don't know how beneficial such a cache would be, as a check could report as good during the timeout window when in fact it isn't. And if the timeout is tuned small enough to prevent that, then why worry about a cache at all? |
If a check reports good during the timeout window although it isn't then you get the result only delayed if the problem still exists (assuming you have a monitoring which calls the endpoint regularly). If the problem only occurred during the timeout and not after that, I would consider this as a temporary fluctuation which is not relevant. My point is that there may be some expensive checks which lasts several seconds. What if you have multiple requests at the same time? Is it really necessary to execute the same check at the same time in multiple threads? Or some checks run amok? I've seen such situation multiple times. The monitoring worsened the problem by exhausting thread pools and cpu resources. Of course there are a lot of other possible countermeasures, but from my experience caching is a simple and easy way which works in most of the cases. |
I'm curious as to the use case where you would potentially be executing multiple health check requests at, or near, the same time |
For instance because there are several monitoring systems for different use cases (and no we can't get rid of them). Also the functionality is integrated in our client and can be accessed by administrative users. And also several persons, mostly from the application maintenance department, know how to call the endpoint. |
We have a similar situation.
Our solution was to
It would be nice if the framework could support:
|
Some checks may be be expensive and it is not necessary to execute them each time the health endpoint is called. Also in case of system problems some checks may execute slower which could lead into a vicious cycle. Monitoring clients or users are bombarding the system in case of a failure to see if the problem is still there.
Therefore we introduced a cache, so that monitoring checks are not executed again when they are called within a specific timeframe. In my opinion microprofile health could deliver a huge benefit if such a behaviour would be configurable per check.
For instance by defining a expiration time for each check:
The text was updated successfully, but these errors were encountered: