Instrumentation #27

tmeasday · 2016-10-28T16:02:15Z

Following up from rmosolgo/graphql-ruby#354 (comment):

For graphql-batch I would just recommend using Promise.map_value to get the end of the field resolution:

class MyFieldInstrumentation
  def instrument(type, field)
    old_resolve_proc = field.resolve_proc
    new_resolve_proc = ->(obj, args, ctx) {
      Rails.logger.info("#{type.name}.#{field.name} START: #{Time.now.to_i}")
      result = old_resolve_proc.call(obj, args, ctx)
      Promise.map_value(result) do |value|
        Rails.logger.info("#{type.name}.#{field.name} END: #{Time.now.to_i}")
        value
      end
    }

    field.redefine do
      resolve(new_resolve_proc)
    end
  end
end

Actually the above does not work (or at least not in a particularly useful way). The problem is that the executor does not necessarily run the loader associated with the field immediately, instead it may wait for some other loaders (or fields I suppose) to execute first. So the "start time" logged above (when the field is added to the loader), is not really "correct", in terms of the work done to fetch the field.

As an example, if I run a query against our example GitHunt server that looks something like:

{
  entries {
     # this field goes to a batch loader that talks to github
     postedBy
     # this field goes to another github batch loader
     repository
     # this field goes to a simple SQL batch loader
     vote
  }
}

You end up seeing traces that look like:

Note in the above that the start time of all the fields is more or less the same (as the executor runs over the set of entries, and they are all added to the various loaders), and the total time is "cumulative". In actuality the vote loader is more or less instantaneous (running against a local sqlite db in this case), and a correct start time for those fields should really be at the end of the repository loader (so I guess 2.03ms in this SS).

This is why I think for proper instrumentation of the batch loader I think we need to know two things:

What time a loader actually starts.
Which fields the loader is running for.

The first part is trivial I suppose, but the second seems tricky.

The text was updated successfully, but these errors were encountered:

dylanahsmith · 2016-10-31T15:20:07Z

Yes, that code example tracks the start and end of resolving a field, not the time it spends actually executing those fields, so no other integration is needed with graphql-ruby. You would see a similar problem any time fields are able to resolved completely concurrently, such as from thread/connection pools, locking or CPU intensive tasks in a parallel executor.

The problem really is about instrumenting batch loaders.

a correct start time for those fields should really be at the end of the repository loader

Although the resolving is mostly happening in the batch loader, the resolver could still take significant time. I don't think this fact should be ignored, since the resolver might end up doing an unbatched load or non-trivial computation before doing the batch load, so assuming the start of execution happens in the loader would be misleading in a very important case.

If you actually want to visualize the batch execution, then you will need to make sure that you can collect more than just the start and end time for resolving a field.

The other thing that will complicate the instrumentation that you didn't mention is that GraphQL::Batch::Promise callbacks get called immediately after they are fulfilled, so a callback that does non-trivial computation or an unbatched query could make it look like that time was spent in the batch loader itself.

What time a loader actually starts.

One possibility here is to allow the application to configure the executor so they can mixin instrumenation into it. I think this would also be useful to support parallel or concurrent execution.

Another would be to have a subscription and notification system that would allow code to round around the batch load, which would allow it to get the start and end time.

Which fields the loader is running for.

Ideally I would like to move the async execution strategy support into graphql-ruby and to decouple the batch loaders from graphql, since I could see batch loading being useful outside of graphql (e.g. batch loading data accessed during liquid rendering). As such, I don't plan on having the batch loaders track these fields, but I do think it should allow this to be done.

However, by having a way to run code around field resolving and batch loading, it would already be possible to keep track of the context in which a batch load is requested. So I think the only thing missing would be instrumentation for when GraphQL::Batch::Loader#load is called.

I was already planning on making it so that GraphQL::Batch::Loader#load would make sure the loader was registered with the executor, so that the GraphQL::Batch::Loader could be used to cache loads across a GraphQL request. That means this instrumentation could also be handled by the executor.

tmeasday · 2016-11-02T01:52:39Z

You would see a similar problem any time fields are able to resolved completely concurrently, such as from thread/connection pools, locking or CPU intensive tasks in a parallel executor.

I assume you mean "any time fields not are able to be resolved..."?

Although the resolving is mostly happening in the batch loader, the resolver could still take significant time. I don't think this fact should be ignored ...

You're right of course. I think a current line of thought is to measure times for field [start, return, io_start, resolve] (where io_start is the batch start time here). This would capture the "non-trivial computation before doing the batch load" in the start-return phase.

Although thinking about this more (and considering your post-batch processing time point), we may need to get even more nuanced if we want to be completely correct here. It's certainly conceivable that a field may end up executing over multiple non-contiguous blocks of time before resolving in some execution models. I'm not sure if we want to fudge this fact or not.

Ideally I would like to move the async execution strategy support into graphql-ruby and to decouple the batch loaders from graphql..

This makes a lot of sense to me also.

tmeasday · 2016-11-02T03:33:51Z

Some expanded current thinking of data to collect: https://github.com/apollostack/optics-agent/blob/pcarrier/proposals/timing/proposals/timing.md

tmeasday · 2016-12-02T04:44:28Z

@dylanahsmith looking at this again, and working through the new lazy_resolve feature in graphql (rmosolgo/graphql-ruby#429 (comment)), I think ultimately all I need is some way via graphql-batch to get a list of all the promises that were resolved together.

Then when the first promise in the batch calls .sync I can record the time, and when already fulfilled promises do the same I can can find the sibling that kicked off the loader.

Am I making sense here?

tmeasday mentioned this issue Oct 28, 2016

Instrumentation rmosolgo/graphql-ruby#354

Merged

2 tasks

rmosolgo mentioned this issue Nov 17, 2016

First-class async support rmosolgo/graphql-ruby#274

Closed

dylanahsmith mentioned this issue Nov 22, 2016

Require loaders to be used within a GraphQL::Batch.batch block #36

Merged

dylanahsmith added the feature request label Feb 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation #27

Instrumentation #27

tmeasday commented Oct 28, 2016 •

edited

dylanahsmith commented Oct 31, 2016

tmeasday commented Nov 2, 2016

tmeasday commented Nov 2, 2016

tmeasday commented Dec 2, 2016

Instrumentation #27

Instrumentation #27

Comments

tmeasday commented Oct 28, 2016 • edited

dylanahsmith commented Oct 31, 2016

tmeasday commented Nov 2, 2016

tmeasday commented Nov 2, 2016

tmeasday commented Dec 2, 2016

tmeasday commented Oct 28, 2016 •

edited