Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype batch api #4024

Closed
wants to merge 1 commit into from

Conversation

jack-berg
Copy link
Member

A few weeks ago in the java SIG we discussed that it might be valuable to prototype what a batch API. IIRC, the thought was that prototyping a batch API might find some intersection with the bind API we recently dropped. I dug through the spec and found this old API reference to a batch API, and used some of the notes as inspiration.

The idea is that you can "atomically" record values for several instruments using the same attributes. I interpreted atomic as guaranteeing that either all or none of the recordings make it into a particular collection. This seems only somewhat useful to me. However, the API experience of recording a batch of measurements does feel a bit improved.

The naive implementation I put together in this PR has worse performance than if you were record the instruments using the existing API. This is because the implementation doesn't get to take advantage of acquiring fewer locks by recording in a batch, and actually does some additional synchronization to ensure that all the recordings occur atomically for a collection. There's likely some room for improvement, but is always going to take extra work to guarantee the atomicity.

return new SdkBatchRecorder(batchLatch);
}

static class BatchLatch {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the naive synchronization mechanism that ensures the recordings are included in a collection atomically.

When BatchRecorder#record(Attributes, Context) is called, startBatchRecord() is called on start. This blocks if a collection is in progress. If not, it acquires a permit from batchSemaphore that blocks collections until the record is completed and finishBatchRecord() is called to release the permit.

When a collect occurs, startCollect() is called. This sets collectLatch = new CountDownLatch(1), which blocks batch records from occurring until the collect completes. It then blocks while acquiring all the permits from batchSemaphore, which allows in progress recordings to complete. When the collect is done, it calls finishCollect(), and counts down the latch and returns the permits.

.addMeasurements(10.0, doubleHistogram, doubleCounter, doubleUpDownCounter)
.record(Attributes.builder().put("foo", "bar").build(), Context.current());

collectAndPrint();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compares the API usage currently versus with the batch API.

@jsuereth
Copy link
Contributor

Good investigation!

  1. This looks like it revives the old "synchronous batch" API. I would like to try to do something like this for Async (we have existing user requests for that one first).
  2. Regarding performance / countdown latch, you'll likely need to go deep into the SDK to fix this. From what I see, there's a simple optimisation of just remembering classes associated with values. IIUC, you're trying to:
  • Avoid calling record in more than one thread at a time
  • Make sure all the recording from build are eventually written.

I had hoped we could tackle both the bind issue (for sync batch) by effectively having Batch instruments "pre-register" the values they'll be writing (and possibly the attributes they'll use). Ironically, recording the attributes first would possibly allow you to optimise some of the code.

Even more crazy, if we had something using some annotation processor magic to do something like:

@BatchAsyncMetrics
class MyAsyncStats {
  @Counter(name = "request_count", description="# of Requests seen", unit="{requests}"
  private AtomicLong requestCount = new AtomicLong(0);
  @UpDownCounter(name = "queue_size", description="# of items in message queue", unit="{messages}")
  private AtomicLong queueSize = new AtomicLong(0);
  ...
  
  private MyAsyncStats() {}
}

MyAsyncStats myAsyncStats = ...;
meter.batchBuilder().buildWithAsyncAnnotations(myAsyncStats);

Additionally, for Async, we even thought about something like:

meter.registerCollectionCallback(() -> {
  Attributes attr = ...
  syncInstrument.record(value1, attr)
  syncInstrument2.record(value2, attr)
})

For sync batch, I was hoping for something more like:

@BatchInstrument
interface MySyncMetrics {
  @Histogram(...)
  void recordLatency(double value);
  @Counter(...)
  void recordError(); // Defaults to adding one
}

BatchMeter<MySyncMetrics> batchMeter = meter.batchBuilder().withInterface(MySyncMetrics.class).build();

batchMeter.record(attributes, (mySyncMetrics) -> {
  mySyncMetrics.recordLatency(10.5)
  if (httpResponse.code != 200) {
    mySyncMetrics.recordError();
  }
});

Specifically, it'd be ideal if there was some kind of "registration" phase where we can see what instruments are included in the batch and pre-optimise (in the SDK) a record path.

@jack-berg
Copy link
Member Author

I had hoped we could tackle both the bind issue (for sync batch) by effectively having Batch instruments "pre-register" the values they'll be writing (and possibly the attributes they'll use). Ironically, recording the attributes first would possibly allow you to optimise some of the code.

I'm struggling to imagine a scenario where an instrument would know the value ahead of time, besides perhaps when incrementing a counter. If an instrument is able to know ahead of time the value and attributes it will be recording, it doesn't seem like its measuring anything very useful, since at that point only the context can change.

It's more reasonable that an instrument would know the attributes ahead of time. In this case, the implementation could use the bind API to preallocate. But doing so limits the usefulness of the batch API, since a batch API designed for attributes not known up front would look quite different.

I guess I'd like to better understand what problems a batch API aims to solve. The only definitive advantage of a batch API I could identify is being able to atomically record to multiple instruments, ensuring that all appear in the same collection. This only seems marginally useful. Other advantages can be characterized as syntactic sugar, which I think can be implemented via helper functions and extensions on top of the API.

@jsuereth
Copy link
Contributor

jsuereth commented Jan 1, 2022

I'm struggling to imagine a scenario where an instrument would know the value ahead of time, besides perhaps when incrementing a counter. If an instrument is able to know ahead of time the value and attributes it will be recording, it doesn't seem like its measuring anything very useful, since at that point only the context can change.

Sorry I wasn't clear here. I meant pre-register the attributes not the value.

@jack-berg
Copy link
Member Author

I was reading through spec PR #2363 and thinking about what a batch API might look like in java. I agree that the concept of having a single callback that can record to multiple async instruments is appealing, since its hard for a user to organize this type of thing if the callback functions are expensive.

This PR doesn't really accomplish that very well, but we could mimic what @jamcd proposed with something like:

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

Notes:

  • Instruments would have a new method called observer() which returns ObservableLongMeasurement / ObservableDoubleMeasurement. These must be called before you register a batch callback, as they provide a way to record to a specific instrument.
  • When you register a batch callback, you just register a Runnable, which can record to any ObservableLongMeasurement / ObservableLongMeasurement. You also provide a list of the observers you intend to record to. The SDK can use the list of observers to ensure that the runnable isn't recording to instruments erroneously.

Thoughts @jsuereth, @jkwatson, @anuraaga?


@Test
void batch() {
Meter meter = GlobalOpenTelemetry.get().getMeterProvider().get("meter");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why you would use the global for this. Why not just construct an instance, rather than rely on the global state and resetting it each time?

@jkwatson
Copy link
Contributor

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

Thoughts @jsuereth, @jkwatson, @anuraaga?

I think I would probably reverse the parameter order here, and send in the observers first, then the callback at the end (which I think will be nice for kotlin users, for example).

but, this does seem to be a reasonable approach. Not sure about the naming of ObservableLongInstrument etc, but that can be properly bike shedded when the time comes.

@anuraaga
Copy link
Contributor

    ObservableLongMeasurement fooObserver = meter.counterBuilder("foo").observer();
    ObservableLongMeasurement barObserver = meter.counterBuilder("bar").observer();

    meter.registerBatchCallback(() -> {
      fooObserver.record(10);
      barObserver.record(20);
    }, Arrays.asList(fooObserver, barObserver));

In #2363 I don't see the observers passed into the registerbatchcallback method, is there a need to be doing that here?

Also, I suspect we can do something like "run all batch callbacks first" during collection. If so, those could just be synchronous instruments I guess? There doesn't seem to be a difference between the Observer and a normal synchronous instrument in terms of the API usage here, so ideally we don't even need it.

@kittylyst
Copy link
Contributor

What's the status of this - I definitely have use cases where this would be very useful.

@jack-berg
Copy link
Member Author

@kittylyst its blocked on this spec issue.

I imagine that our first stable metrics release will not have support for batch callbacks.

@jack-berg
Copy link
Member Author

Closing in favor of #4376.

@jack-berg jack-berg closed this Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants