Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes readiness probe endpoint returning 404 #22562

Closed
chadlwilson opened this issue Jul 25, 2020 · 18 comments
Closed

Kubernetes readiness probe endpoint returning 404 #22562

chadlwilson opened this issue Jul 25, 2020 · 18 comments
Assignees
Labels
type: regression A regression from a previous release
Milestone

Comments

@chadlwilson
Copy link

chadlwilson commented Jul 25, 2020

There appears to be some change in behaviour for the Kubernetes-oriented readiness group endpoint on 2.3.2 compared to 2.3.1.

For a service that has no external dependencies (and only readinessState in the health group), the /actuator/health/readiness endpoint is returning a 404.

Configuration we are using:

management.server.port=9083
management.health.probes.enabled=true
management.endpoints.enabled-by-default=false
management.endpoint.info.enabled=true
management.endpoint.health.enabled=true
management.endpoint.health.show-details=always
management.endpoint.health.group.liveness.include=livenessState,diskSpace,refreshScope
management.endpoint.health.group.readiness.include=readinessState
management.endpoint.health.group.liveness.show-details=always
management.endpoint.health.group.readiness.show-details=always
management.endpoints.web.exposure.include=health

Expected Behaviour
We expect this to just return 200 with { "status": "UP" }

Actual Behaviour

$ http http://localhost:9083/actuator/health/readiness
HTTP/1.1 404 Not Found

Full health call:

$ http http://localhost:9083/actuator/health
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: application/json
Date: Sat, 25 Jul 2020 06:27:55 GMT
Transfer-Encoding: chunked
{
    "components": {
        "discoveryComposite": {
            "components": {
                "discoveryClient": {
                    "description": "Discovery Client not initialized",
                    "status": "UNKNOWN"
                }
            },
            "description": "Discovery Client not initialized",
            "status": "UNKNOWN"
        },
        "diskSpace": {
            "details": {
                "exists": true,
                "free": 287311962112,
                "threshold": 10485760,
                "total": 499963174912
            },
            "status": "UP"
        },
        "livenessStateProbeIndicator": {
            "status": "UP"
        },
        "ping": {
            "status": "UP"
        },
        "reactiveDiscoveryClients": {
            "components": {
                "Simple Reactive Discovery Client": {
                    "description": "Discovery Client not initialized",
                    "status": "UNKNOWN"
                }
            },
            "description": "Discovery Client not initialized",
            "status": "UNKNOWN"
        },
        "readinessStateProbeIndicator": {
            "status": "UP"
        },
        "refreshScope": {
            "status": "UP"
        }
    },
    "groups": [
        "liveness",
        "readiness"
    ],
    "status": "UP"
}

This may relate to #22107.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 25, 2020
@chadlwilson
Copy link
Author

After a bit more digging, I'm not really sure why or whether it was intended, however the issue seems to be that readinessState has become readinessStateProbeIndicator (and same for livenessState) so the old configuration was not correctly including the indicator at all, leaving the readiness group empty.

This seems to work as expected.

management.endpoint.health.group.liveness.include=livenessStateProbeIndicator,diskSpace,refreshScope
management.endpoint.health.group.readiness.include=readinessStateProbeIndicator

@bclozel bclozel self-assigned this Jul 25, 2020
@bclozel bclozel added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Jul 25, 2020
@bclozel bclozel added this to the 2.3.3 milestone Jul 25, 2020
@bclozel
Copy link
Member

bclozel commented Jul 25, 2020

Yes this is an unintended side effect of #22107. The workaround you're mentioning is the right one in the meantime.

Thanks for raising this issue!

@chadlwilson
Copy link
Author

No problem - feel free to re-title it as appropriate.

Unfortunately this is a transparently breaking change for many people, they probably won't realise the probe status isn't being included in the status in addition to, say, db, redis etc because including a non-existent indicator in a group doesn't seem to fail startup :(

@bclozel bclozel added type: regression A regression from a previous release and removed type: bug A general bug labels Jul 25, 2020
@bclozel
Copy link
Member

bclozel commented Jul 25, 2020

I've tagged this issue as a regression.

I'm really sorry for letting in that one.

joergjo added a commit to joergjo/springboot-samples that referenced this issue Jul 27, 2020
@wilkinsona wilkinsona changed the title Kuberenetes readiness probe endpoint returning 404 on Spring Boot 2.3.2 Kubernetes readiness probe endpoint returning 404 on Spring Boot 2.3.2 Jul 27, 2020
@OrangeDog
Copy link
Contributor

Does this cover the fact that they are listed under groups at /health, but then don't actually exist?

@agrappin
Copy link

agrappin commented Jul 29, 2020

I precisely have the same issue than @OrangeDog . On my container with management.endpoint.health.probes.enabled=true:

  • When executing GET /actuator/health:
    { "status": "UP", "groups": [ "liveness", "readiness" ] }

  • When executing GET /actuator/health/liveness:
    404 Not Found

@chadlwilson
Copy link
Author

* When executing GET `/actuator/health`:
  `{ "status": "UP", "groups": [ "liveness", "readiness" ] }`

* When executing GET `/actuator/health/liveness`:
  `404 Not Found`

I agree this is potentially confusing, but doesn't seem to be the main problem here?

I wonder whether the /actuator/health endpoint behaved differently under 2.3.1 if a group has no configured components? i.e it filtered them out from groups: [] ?

I guess this is a matter of design - the group exists but has no (valid) components, therefore its status is indeterminate, therefore the implementation returns a 404? It certainly can't return 200 OK....

Would we

  • want to be aware the groups exist, so we know we can add components to them with include ?
  • or have them disappear from the top level endpoint so we don't even know they are there?

@ttddyy
Copy link
Contributor

ttddyy commented Jul 29, 2020

Instead of referencing readinessStateProbeIndicator and livenessStateProbeIndicator, I think you need to set management.health.livenessstate.enabled and management.health.readinessstate.enabled properties introduced by spring-boot 2.3.2. So that, you could use readinessState and livenessState reference.

When management.health.[readiness|livenessstate].enabled properties are set to false(by default), AvailabilityProbesAutoConfiguration creates readinessStateProbeIndicator and livenessStateProbeIndicator beans which need to be referenced as [readiness|liveness]StateProbeIndicator(full bean name).

On the other hand, when properties are enabled, AvailabilityHealthContributorAutoConfiguration creates [readiness|liveness]StateHealthIndicator beans which can be referenced as [readiness|liveness]State.

The problem is in AvailabilityProbesHealthEndpointGroups created by AvailabilityProbesHealthEndpointGroupsPostProcessor, this creates readiness/liveness groups with [readiness|liveness]State.
So, if [readiness|liveness]State are not available, groups are created but referenced HealthIndicator beans are not there.

@OrangeDog
Copy link
Contributor

want to be aware the groups exist, so we know we can add components to them with include ?

The API response is supposed to be for consumers of the API, not documenting configuration options for the developer. Like the rest of the actuator system, only endpoints that are currently available should be listed as available.

@agrappin
Copy link

agrappin commented Jul 30, 2020

When management.health.[readiness|livenessstate].enabled properties are set to false(by default)

FYI surprisingly enough Spring Boot decided to name the readiness state property management.health.readynessstate.enabled with a y in the 2.3.2.RELEASE version (most recent release at this date).


See the reference: https://docs.spring.io/spring-boot/docs/2.3.2.RELEASE/reference/html/appendix-application-properties.html#actuator-properties

@OrangeDog
Copy link
Contributor

@antoinegrappin no, that's just a documentation error. The property is readiness.

@agrappin
Copy link

@OrangeDog indeed, I confirm after tests.

@bclozel bclozel changed the title Kubernetes readiness probe endpoint returning 404 on Spring Boot 2.3.2 Kubernetes readiness probe endpoint returning 404 Aug 1, 2020
@bclozel bclozel closed this as completed in 8dedeb4 Aug 1, 2020
@bclozel
Copy link
Member

bclozel commented Aug 1, 2020

This issue is now fixed in the 2.3.3 and 2.4.0 SNAPSHOTs.

I've carefully read the comments on this issue regarding the following surprising behavior: getting a 404 status on a configured health group, when no indicator is present. In this very case it's arguably wrong, but we're in a case of a regression. But some of you thought that

  1. a missing indicator in a group should fail the application at startup or
  2. that an empty group should disappear from the list of groups on the main endpoint.

The first alternative sounds nice, especially for detecting bad configurations. But it's also likely to fail in perfectly valid cases. Your application could configure a group management.endpoint.health.group.custom.include=ping,redis and fail in a test environment where no redis instance is available. Because Spring Boot reacts to the environment, it's expected to behave differently and adapt to the situation.

The second alternative is debatable. Right now our health groups support is auto-configured with the configuration properties and does not look into the application context to check for the existence of health indicators. We seem to all agree that a 404 response status is right in this case. Removing the group information would, in my opinion, make things less consistent as we wouldn't know that a group has been configured. After all, a health group is just a way to wrap several indicators under the same name and customize its global health status - but health indicators are still dynamic.

After discussing that briefly with the team, we didn't think that this needs to be changed. Note that this behavior exists since the introduction of the health groups feature. If you can make a stronger case for changing this, please create a dedicated issue and explain how this behavior is inconsistent or could lead to issues.

Thanks!

janolaveide added a commit to navikt/foreldrepengesoknad-api that referenced this issue Aug 5, 2020
janolaveide added a commit to navikt/foreldrepengesoknad-api that referenced this issue Aug 5, 2020
janolaveide added a commit to navikt/foreldrepengesoknad-api that referenced this issue Aug 5, 2020
mbhave added a commit to spring-io/start.spring.io that referenced this issue Aug 11, 2020
It seems that this regression in 2.3.2 causes the liveness endpoint
to 404: spring-projects/spring-boot#22562
and the app goes in a crash loop
@chadlwilson
Copy link
Author

Thanks @bclozel - fix is working fine in 2.3.3 after removing the workaround to the probe names I mentioned above :-)

@salaboy
Copy link

salaboy commented Aug 30, 2020

@chadlwilson can you share your configurations in 2.3.3? I am finding the same issue there..

@bclozel
Copy link
Member

bclozel commented Aug 30, 2020

@salaboy If your application runs on kubernetes, you don't need any specific configuration.
If it doesn't, you need to enable the probes with the following:

management.endpoint.health.probes.enabled=true 

@vishalmamidi
Copy link

@bclozel what are values am supported to give in my deployment manifests

image

@ahmetgeymen
Copy link

@bclozel what are values am supported to give in my deployment manifests

image

...
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: http
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: http
...

The issue has been resolved with version 2.3.3. You can expose separate probes with dedicated Health Indicators. You may want to look up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

No branches or pull requests

9 participants