Prototype batch api #4024

jack-berg · 2021-12-23T00:53:54Z

A few weeks ago in the java SIG we discussed that it might be valuable to prototype what a batch API. IIRC, the thought was that prototyping a batch API might find some intersection with the bind API we recently dropped. I dug through the spec and found this old API reference to a batch API, and used some of the notes as inspiration.

The idea is that you can "atomically" record values for several instruments using the same attributes. I interpreted atomic as guaranteeing that either all or none of the recordings make it into a particular collection. This seems only somewhat useful to me. However, the API experience of recording a batch of measurements does feel a bit improved.

The naive implementation I put together in this PR has worse performance than if you were record the instruments using the existing API. This is because the implementation doesn't get to take advantage of acquiring fewer locks by recording in a batch, and actually does some additional synchronization to ensure that all the recordings occur atomically for a collection. There's likely some room for improvement, but is always going to take extra work to guarantee the atomicity.

jack-berg · 2021-12-23T01:02:23Z

sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/SdkMeter.java

+    return new SdkBatchRecorder(batchLatch);
+  }
+
+  static class BatchLatch {


This is the naive synchronization mechanism that ensures the recordings are included in a collection atomically.

When BatchRecorder#record(Attributes, Context) is called, startBatchRecord() is called on start. This blocks if a collection is in progress. If not, it acquires a permit from batchSemaphore that blocks collections until the record is completed and finishBatchRecord() is called to release the permit.

When a collect occurs, startCollect() is called. This sets collectLatch = new CountDownLatch(1), which blocks batch records from occurring until the collect completes. It then blocks while acquiring all the permits from batchSemaphore, which allows in progress recordings to complete. When the collect is done, it calls finishCollect(), and counts down the latch and returns the permits.

jack-berg · 2021-12-23T01:03:36Z

sdk/metrics/src/test/java/io/opentelemetry/sdk/metrics/BatchTest.java

+        .addMeasurements(10.0, doubleHistogram, doubleCounter, doubleUpDownCounter)
+        .record(Attributes.builder().put("foo", "bar").build(), Context.current());
+
+    collectAndPrint();


This compares the API usage currently versus with the batch API.

jsuereth · 2021-12-24T14:00:02Z

Good investigation!

This looks like it revives the old "synchronous batch" API. I would like to try to do something like this for Async (we have existing user requests for that one first).
Regarding performance / countdown latch, you'll likely need to go deep into the SDK to fix this. From what I see, there's a simple optimisation of just remembering classes associated with values. IIUC, you're trying to:

Avoid calling record in more than one thread at a time
Make sure all the recording from build are eventually written.

I had hoped we could tackle both the bind issue (for sync batch) by effectively having Batch instruments "pre-register" the values they'll be writing (and possibly the attributes they'll use). Ironically, recording the attributes first would possibly allow you to optimise some of the code.

Even more crazy, if we had something using some annotation processor magic to do something like:

@BatchAsyncMetrics
class MyAsyncStats {
  @Counter(name = "request_count", description="# of Requests seen", unit="{requests}"
  private AtomicLong requestCount = new AtomicLong(0);
  @UpDownCounter(name = "queue_size", description="# of items in message queue", unit="{messages}")
  private AtomicLong queueSize = new AtomicLong(0);
  ...
  
  private MyAsyncStats() {}
}

MyAsyncStats myAsyncStats = ...;
meter.batchBuilder().buildWithAsyncAnnotations(myAsyncStats);

Additionally, for Async, we even thought about something like:

meter.registerCollectionCallback(() -> {
  Attributes attr = ...
  syncInstrument.record(value1, attr)
  syncInstrument2.record(value2, attr)
})

For sync batch, I was hoping for something more like:

@BatchInstrument
interface MySyncMetrics {
  @Histogram(...)
  void recordLatency(double value);
  @Counter(...)
  void recordError(); // Defaults to adding one
}

BatchMeter<MySyncMetrics> batchMeter = meter.batchBuilder().withInterface(MySyncMetrics.class).build();

batchMeter.record(attributes, (mySyncMetrics) -> {
  mySyncMetrics.recordLatency(10.5)
  if (httpResponse.code != 200) {
    mySyncMetrics.recordError();
  }
});

Specifically, it'd be ideal if there was some kind of "registration" phase where we can see what instruments are included in the batch and pre-optimise (in the SDK) a record path.

jack-berg · 2021-12-27T20:51:58Z

I had hoped we could tackle both the bind issue (for sync batch) by effectively having Batch instruments "pre-register" the values they'll be writing (and possibly the attributes they'll use). Ironically, recording the attributes first would possibly allow you to optimise some of the code.

I'm struggling to imagine a scenario where an instrument would know the value ahead of time, besides perhaps when incrementing a counter. If an instrument is able to know ahead of time the value and attributes it will be recording, it doesn't seem like its measuring anything very useful, since at that point only the context can change.

It's more reasonable that an instrument would know the attributes ahead of time. In this case, the implementation could use the bind API to preallocate. But doing so limits the usefulness of the batch API, since a batch API designed for attributes not known up front would look quite different.

I guess I'd like to better understand what problems a batch API aims to solve. The only definitive advantage of a batch API I could identify is being able to atomically record to multiple instruments, ensuring that all appear in the same collection. This only seems marginally useful. Other advantages can be characterized as syntactic sugar, which I think can be implemented via helper functions and extensions on top of the API.

jsuereth · 2022-01-01T18:40:31Z

I'm struggling to imagine a scenario where an instrument would know the value ahead of time, besides perhaps when incrementing a counter. If an instrument is able to know ahead of time the value and attributes it will be recording, it doesn't seem like its measuring anything very useful, since at that point only the context can change.

Sorry I wasn't clear here. I meant pre-register the attributes not the value.

jack-berg · 2022-02-24T17:02:17Z

I was reading through spec PR #2363 and thinking about what a batch API might look like in java. I agree that the concept of having a single callback that can record to multiple async instruments is appealing, since its hard for a user to organize this type of thing if the callback functions are expensive.

This PR doesn't really accomplish that very well, but we could mimic what @jamcd proposed with something like:

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

Notes:

Instruments would have a new method called observer() which returns ObservableLongMeasurement / ObservableDoubleMeasurement. These must be called before you register a batch callback, as they provide a way to record to a specific instrument.
When you register a batch callback, you just register a Runnable, which can record to any ObservableLongMeasurement / ObservableLongMeasurement. You also provide a list of the observers you intend to record to. The SDK can use the list of observers to ensure that the runnable isn't recording to instruments erroneously.

Thoughts @jsuereth, @jkwatson, @anuraaga?

jkwatson · 2022-02-24T22:08:14Z

sdk/metrics/src/test/java/io/opentelemetry/sdk/metrics/BatchTest.java

+
+  @Test
+  void batch() {
+    Meter meter = GlobalOpenTelemetry.get().getMeterProvider().get("meter");


curious why you would use the global for this. Why not just construct an instance, rather than rely on the global state and resetting it each time?

jkwatson · 2022-02-24T22:18:55Z

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

Thoughts @jsuereth, @jkwatson, @anuraaga?

I think I would probably reverse the parameter order here, and send in the observers first, then the callback at the end (which I think will be nice for kotlin users, for example).

but, this does seem to be a reasonable approach. Not sure about the naming of ObservableLongInstrument etc, but that can be properly bike shedded when the time comes.

anuraaga · 2022-02-25T01:00:53Z

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

In #2363 I don't see the observers passed into the registerbatchcallback method, is there a need to be doing that here?

Also, I suspect we can do something like "run all batch callbacks first" during collection. If so, those could just be synchronous instruments I guess? There doesn't seem to be a difference between the Observer and a normal synchronous instrument in terms of the API usage here, so ideally we don't even need it.

kittylyst · 2022-03-24T08:14:56Z

What's the status of this - I definitely have use cases where this would be very useful.

jack-berg · 2022-03-24T16:38:15Z

@kittylyst its blocked on this spec issue.

I imagine that our first stable metrics release will not have support for batch callbacks.

jack-berg · 2022-04-12T18:35:06Z

Closing in favor of #4376.

Prototype batch api

5390de6

jack-berg commented Dec 23, 2021

View reviewed changes

jkwatson reviewed Feb 24, 2022

View reviewed changes

jack-berg mentioned this pull request Apr 12, 2022

Add batch callback API #4376

Merged

jack-berg closed this Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype batch api #4024

Prototype batch api #4024

jack-berg commented Dec 23, 2021

jack-berg Dec 23, 2021

jack-berg Dec 23, 2021

jsuereth commented Dec 24, 2021

jack-berg commented Dec 27, 2021

jsuereth commented Jan 1, 2022

jack-berg commented Feb 24, 2022

jkwatson Feb 24, 2022

jkwatson commented Feb 24, 2022

anuraaga commented Feb 25, 2022

kittylyst commented Mar 24, 2022

jack-berg commented Mar 24, 2022

jack-berg commented Apr 12, 2022

Prototype batch api #4024

Prototype batch api #4024

Conversation

jack-berg commented Dec 23, 2021

jack-berg Dec 23, 2021

Choose a reason for hiding this comment

jack-berg Dec 23, 2021

Choose a reason for hiding this comment

jsuereth commented Dec 24, 2021

jack-berg commented Dec 27, 2021

jsuereth commented Jan 1, 2022

jack-berg commented Feb 24, 2022

jkwatson Feb 24, 2022

Choose a reason for hiding this comment

jkwatson commented Feb 24, 2022

anuraaga commented Feb 25, 2022

kittylyst commented Mar 24, 2022

jack-berg commented Mar 24, 2022

jack-berg commented Apr 12, 2022