Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "active" metrics handling in WebClient observations #31702

Closed
jtorkkel opened this issue Nov 28, 2023 · 1 comment
Closed

Improve "active" metrics handling in WebClient observations #31702

jtorkkel opened this issue Nov 28, 2023 · 1 comment
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) theme: observability An issue related to observability and tracing type: enhancement A general enhancement
Milestone

Comments

@jtorkkel
Copy link

Current behavior:

WebClient long task timer task active metrics does not show "destination host" in "client_name" label, instead always shows "none".

http_client_requests_active_seconds_active_count{client_name="none",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 7.0

In service which has more than 1 webClient instance to different remote host all long task timer active metrics are aggregated and metrics cannot detect the number of pending requests per destination host. This decrease the value of the metrics as breakdown per destination host calls cannot be done.

When MVC is used instead of webflux then "client_name" label contains correct destination host".

http_client_requests_active_seconds_active_count{client_name="serviceB.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 5.0
http_client_requests_active_seconds_active_count{client_name="serviceC.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 2.0

Expected behavior:
webClient should report "client_name" label correctly matching remote destination address shown in normal metrics.

Potential root cause:

DefaultWebClient exchange() store the requestsBuilder() to carrier() before observation start, but "request" is stored to context only after observation is started just before applying the filter.

DefaultClientRequestObservationConvention/clientName() reads the "client_name" from context when long task timer is started and get "null" because context is not set. Thus "client_name" is not available for long task timer active start but only for normal metrics stop.

MVC works because "client_name" is read from carrier.

There were some other changes in DefaultWebClient blame but I am not sure if those change or broke the the "client_name" because context is still set after start.

public class DefaultClientRequestObservationConvention implements ClientRequestObservationConvention {
	protected KeyValue clientName(ClientRequestObservationContext context) {
		if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
			return KeyValue.of(LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
		}
		return CLIENT_NAME_NONE;
	}
}
final class DefaultWebClient implements WebClient {
		@SuppressWarnings("deprecation")
		@Override
		public Mono<ClientResponse> exchange() {
			ClientRequestObservationContext observationContext = new ClientRequestObservationContext();
			ClientRequest.Builder requestBuilder = initRequestBuilder();
			return Mono.deferContextual(contextView -> {
				Observation observation = ClientHttpObservationDocumentation.HTTP_REACTIVE_CLIENT_EXCHANGES.observation(observationConvention,
						DEFAULT_OBSERVATION_CONVENTION, () -> observationContext, observationRegistry);
				observationContext.setCarrier(requestBuilder); // carrier available in start
				observation
						.parentObservation(contextView.getOrDefault(ObservationThreadLocalAccessor.KEY, null))
						.start();
				ExchangeFilterFunction filterFunction = new ObservationFilterFunction(observationContext);
				if (filterFunctions != null) {
					filterFunction = filterFunctions.andThen(filterFunction);
				}
				ClientRequest request = requestBuilder.build();
				observationContext.setUriTemplate((String) request.attribute(URI_TEMPLATE_ATTRIBUTE).orElse(null));
				observationContext.setRequest(request); // context saved after start
				Mono<ClientResponse> responseMono = filterFunction.apply(exchangeFunction)
						.exchange(request)
						.checkpoint("Request to " + this.httpMethod.name() + " " + this.uri + " [DefaultWebClient]")
						.switchIfEmpty(NO_HTTP_CLIENT_RESPONSE_ERROR);
				if (this.contextModifier != null) {
					responseMono = responseMono.contextWrite(this.contextModifier);
				}
				final AtomicBoolean responseReceived = new AtomicBoolean();
				return responseMono
						.doOnNext(response -> responseReceived.set(true))
						.doOnError(observationContext::setError)
						.doFinally(signalType -> {
							if (signalType == SignalType.CANCEL && !responseReceived.get()) {
								observationContext.setAborted(true);
							}
							observation.stop();
						})
						.contextWrite(context -> context.put(ObservationThreadLocalAccessor.KEY, observation));
			});
		}
}

Possible customization workaround is reading the clientName from context.getCarrier().build() instead from context. But not clear if this is safe way in case webClient implementation change again.

@Configuration
class NdfClientRequestObservationConvention extends DefaultClientRequestObservationConvention {
    private static final KeyValue CLIENT_NAME_NONE = KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, KeyValue.NONE_VALUE);

    @Override
    protected KeyValue clientName(ClientRequestObservationContext context) {
        if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
        }
        if (context.getCarrier() != null && context.getCarrier().build().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getCarrier().build().url().getHost());
        }
        return CLIENT_NAME_NONE;
    }
}

Environment:
Java: 17/21
SpringBoot 3.1.5, same issue in 3.1.6 and 3.2.0

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Nov 28, 2023
@bclozel bclozel self-assigned this Nov 28, 2023
@bclozel bclozel added in: web Issues in web modules (web, webmvc, webflux, websocket) theme: observability An issue related to observability and tracing labels Nov 28, 2023
@bclozel
Copy link
Member

bclozel commented Dec 11, 2023

I think this is due to a lifecycle issue in general, but we can probably improve the situation.

We cannot give to the ClientRequestObservationContext a fully built ClientRequest, because such instances are immutable and the tracing obervation handler needs to add a propagation HTTP header to the request before it's sent out. As a result, we're creating the context with a ClientRequest.Builder.

When the observation starts, the propagation header is added and the ".active" metric is started. Because we haven't set the actual request on the context and given the current convention implementation, we're seeing this unfortunate behavior.

I think we can change the ClientRequestObservationContext to tighten up a bit the lifecycle there and possibly build a temporary request for this. It would certainly miss the current observation context and the propagation header for obvious lifecycle reasons, but I don't think this would be a problem for most conventions.

@bclozel bclozel added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 11, 2023
@bclozel bclozel added this to the 6.1.2 milestone Dec 11, 2023
@bclozel bclozel changed the title http_client_requests_active_seconds_active_count in not showing correct client_name label values in webclient Improve "active" metrics handling in WebClient observations Dec 12, 2023
izeye added a commit to izeye/spring-framework that referenced this issue Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) theme: observability An issue related to observability and tracing type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants