Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken content negotiation for OpenMetrics #28130

Closed
jonatan-ivanov opened this issue Sep 25, 2021 · 9 comments
Closed

Broken content negotiation for OpenMetrics #28130

jonatan-ivanov opened this issue Sep 25, 2021 · 9 comments
Labels
type: bug A general bug
Milestone

Comments

@jonatan-ivanov
Copy link
Member

It seems PrometheusScrapeEndpoint is unable to produce content in the OpenMetrics format.
The latest (2.30.0) Prometheus server sends the following headers:

GET /actuator/prometheus HTTP/1.1
Host: host.docker.internal:8080
User-Agent: Prometheus/2.30.0
Accept: application/openmetrics-text; version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1
Accept-Encoding: gzip
X-Prometheus-Scrape-Timeout-Seconds: 10

In the Accept header, the version of openmetrics-text is 0.0.1.
In the Java client, the Content-Type constant does not match to this, its version is 1.0.0 and it also has a charset
(see TextFormat):

CONTENT_TYPE_OPENMETRICS_100 = "application/openmetrics-text; version=1.0.0; charset=utf-8";

Since

  1. the PrometheusScrapeEndpoint uses TextOutputFormat that uses the above TextFormat
  2. the header matching logic for @WebEndpoint seems to take the version (and seemingly other fields) into consideration

The PrometheusScrapeEndpoint never returns anything in the OpenMetrics format, it always falls back to text/plain.
I wrote a test which demonstrates the issue: jonatan-ivanov@d5e6b36

Also opened an issue for the Prometheus Java client to fix the version mismatch: prometheus/client_java#702

A couple things to call out:

  • Even if the Prometheus server gets fixed and it sends 1.0.0 instead of 0.0.1 the PrometheusScrapeEndpoint is still broken
    (because of charset)
  • Even if the Java client gets fixed and it sends 0.0.1 instead of 1.0.0 the PrometheusScrapeEndpoint is still broken
    (because of charset)
  • The test actually has three flavors: Jersey, WebMvc and WebFlux; it seems Jersey handles this in the right way (that test is green) while the WebMvc and WebFlux flavors fail
  • It seems that TextFormat::chooseContentType takes this into account
  • Using it can solve the issue: see workaround
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 25, 2021
jonatan-ivanov added a commit to jonatan-ivanov/spring-boot that referenced this issue Sep 25, 2021
This seems like a hack and Producible seems weird since it is both for consuming and producing.
The getProducedMimeType is used for consuming first.
fixes spring-projectsgh-28130
jonatan-ivanov added a commit to jonatan-ivanov/spring-boot that referenced this issue Sep 25, 2021
This seems like a hack and Producible seems weird.
fixes spring-projectsgh-28130
@jonatan-ivanov
Copy link
Member Author

I might have a fix hack for this: jonatan-ivanov@b16f10f

Producible seems weird since it is using the same MimeType for Accept and Content-Type, I think it would worth separating them.

@wilkinsona
Copy link
Member

Thanks, @jonatan-ivanov.

Your test is failing partly because it's parsing multiple media types into one:

MediaType.parseMediaType("application/openmetrics-text; version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1");

It's also using version=0.0.1 and Boot's currently looking for version=1.0.0.

A test that uses a single media type and the "correct" version passes:

@WebEndpointTest
void scrapeWithSingleAcceptibleMediaTypeCanProduceOpenMetrics100(WebTestClient client) {
    MediaType openMetrics = MediaType.parseMediaType("application/openmetrics-text;version=1.0.0");
    client.get().uri("/actuator/prometheus").accept(openMetrics).exchange().expectStatus().isOk().expectHeader()
            .contentType(TextFormat.CONTENT_TYPE_OPENMETRICS_100).expectBody(String.class)
            .value((body) -> assertThat(body).contains("counter1_total").contains("counter2_total")
                    .contains("counter3_total"));
}

However, this test which uses multiple media types parsed individually fails:

@WebEndpointTest
void scrapeWithMultipleAcceptibleMediaTypesCanProduceOpenMetrics100(WebTestClient client) {
    client.get().uri("/actuator/prometheus")
            .accept(MediaType.parseMediaType("application/openmetrics-text;version=1.0.0"),
                    MediaType.parseMediaType("text/plain;version=0.0.4;q=0.5"),
                    MediaType.parseMediaType("*/*;q=0.1"))
            .exchange().expectStatus().isOk().expectHeader().contentType(TextFormat.CONTENT_TYPE_OPENMETRICS_100)
            .expectBody(String.class).value((body) -> assertThat(body).contains("counter1_total")
                    .contains("counter2_total").contains("counter3_total"));
}

It results in a request being sent with a single accept header that has a comma-separated value:

Accept: application/openmetrics-text;version=1.0.0, text/plain;version=0.0.4;q=0.5, */*;q=0.1

Actuator handles this incorrectly, treating it as a single value. I've opened #28137 to fix it.

Let's wait and see what the Prometheus Java Client say about the version. I suspect that aligning with the server will be the right thing to do for this part of the problem.

@wilkinsona
Copy link
Member

wilkinsona commented Sep 27, 2021

I was wrong above. Actuator correctly handles a single header with multiple comma-separated values. The problem is that we punted on proper quality support when we implemented #25738.

We can dodge that limitation by reordering values in TextOutputFormat but that would break for a request that prefers text/plain;version=0.0.4. I suspect we may have to bite the bullet and implement quality-based ordering.

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Sep 27, 2021

I agree that the Prometheus client and server version constants should match (I have no idea why they don't). If Prometheus fixes this on the server side (both client and server will use 1.0.0), I think we still need to face this issue since older Prometheus installations will use the old version (I assume users upgrade their internal Prometheus instances less frequently than their java dependencies).

What do you think about delegating handling this issue to the prometheus client?
It seems that TextFormat::chooseContentType is prepared for this and it ignores the version. We can reuse the result of this method and pass it to the formatter and to the response. Using it can solve the issue: see another workaround I had earlier (this uses a controller, I'm not sure injecting headers is possible for actuator endpoints).

@wilkinsona
Copy link
Member

Thanks for the suggestion, but I'm not sure that ignoring the version is a good idea. It feels like replacing one bug that was waiting to happen with another one. If we ignore the version and application/openmetrics-text evolves and introduces a new version, we may end up with Spring Boot installations that will happily serve an old version of application/openmetrics-text that the client doesn't understand when it would have preferred to receive some other format, like text/plain;version=0.0.4 for example.

@wilkinsona wilkinsona added the for: team-meeting An issue we'd like to discuss as a team to make progress label Sep 28, 2021
@jonatan-ivanov
Copy link
Member Author

I totally agree with you, on the other hand I expect the prometheus client (TextFormat::chooseContentType) to handle this. The Content-Type could be transparent to the actuator endpoint given what the JavaDoc of the Prometheus client says:

Return the content type that should be used for a given Accept HTTP header.

My assumption is that once the version will be important either to TextFormat:: writeFormat or to Prometheus Server, the Prometheus client will handle it in chooseContentType so the Accept header could be just a pass-through to the actuator endpoint.

I'm not sure if you want to depend on such an assumption but if it seems fine, we could do something like this:

@ReadOperation
public WebEndpointResponse<String> scrape(@RequestHeader(value = ACCEPT) String acceptHeader, @Nullable Set<String> includedNames) {
    try {
        Enumeration<MetricFamilySamples> samples = (includedNames != null)
                ? this.collectorRegistry.filteredMetricFamilySamples(includedNames)
                : this.collectorRegistry.metricFamilySamples();

        Writer writer = new StringWriter();
        String contentType = TextFormat.chooseContentType(acceptHeader);
        TextFormat.writeFormat(contentType, writer, samples);

        return new WebEndpointResponse<>(writer.toString(), contentType);
    }
    catch (IOException ex) {
        // This actually never happens since StringWriter doesn't throw an IOException
        throw new IllegalStateException("Writing metrics failed", ex);
    }
}

@jonatan-ivanov
Copy link
Member Author

fyi: prometheus/prometheus#9430

@wilkinsona
Copy link
Member

We're going to align with TextFormat. It ignores versions and quality. If there's an accept header and it contains application/openmetrics-text it produces application/openmetrics-text; version=1.0.0; charset=utf-8 otherwise it produces text/plain; version=0.0.4; charset=utf-8.

@wilkinsona wilkinsona added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged for: team-meeting An issue we'd like to discuss as a team to make progress labels Oct 4, 2021
@wilkinsona wilkinsona modified the milestones: 2.4.x, 2.5.x Oct 4, 2021
@wilkinsona
Copy link
Member

wilkinsona commented Oct 4, 2021

We're going to align with TextFormat

This is easier said than done. Our current Producible support will happily produce text/plain; version=0.0.4; charset=utf-8 by default and will also happily produce application/openmetrics-text when it's requested in isolation. However, when both text/plain and application/openmetrics-text are acceptable it produces the default text/plain. Aligning with TextFormat requires application/openmetrics-text to be produced in this situation.

We either need to enhance Producible so that it can express TextFormat's behaviour or we need to switch from using Producible to an approach that allows HTTP headers to be passed into an endpoint operation so that the endpoint itself can figure out the content type to produce.

wilkinsona pushed a commit that referenced this issue Oct 5, 2021
Add an `isDefault()` method to `Producible` which can be used to
indicate which of the enum values should be used when the accept header
is `*/*` or `null`.

Prior to this commit, the last enum value was always used as the
default.

See gh-28130
wilkinsona added a commit that referenced this issue Oct 5, 2021
Update Prometheus `TextOutputFormat` so that OpenMetrics is used in
preference to text output when an appropriate accept header is found.

If the accept header contains `*/*` or is missing then the text format
will be used.

See gh-28130
@wilkinsona wilkinsona modified the milestones: 2.5.x, 2.5.6 Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants