Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation #27

Open
tmeasday opened this issue Oct 28, 2016 · 4 comments
Open

Instrumentation #27

tmeasday opened this issue Oct 28, 2016 · 4 comments

Comments

@tmeasday
Copy link

tmeasday commented Oct 28, 2016

Following up from rmosolgo/graphql-ruby#354 (comment):

@dylanahsmith :

For graphql-batch I would just recommend using Promise.map_value to get the end of the field resolution:

class MyFieldInstrumentation
  def instrument(type, field)
    old_resolve_proc = field.resolve_proc
    new_resolve_proc = ->(obj, args, ctx) {
      Rails.logger.info("#{type.name}.#{field.name} START: #{Time.now.to_i}")
      result = old_resolve_proc.call(obj, args, ctx)
      Promise.map_value(result) do |value|
        Rails.logger.info("#{type.name}.#{field.name} END: #{Time.now.to_i}")
        value
      end
    }

    field.redefine do
      resolve(new_resolve_proc)
    end
  end
end

Actually the above does not work (or at least not in a particularly useful way). The problem is that the executor does not necessarily run the loader associated with the field immediately, instead it may wait for some other loaders (or fields I suppose) to execute first. So the "start time" logged above (when the field is added to the loader), is not really "correct", in terms of the work done to fetch the field.

As an example, if I run a query against our example GitHunt server that looks something like:

{
  entries {
     # this field goes to a batch loader that talks to github
     postedBy
     # this field goes to another github batch loader
     repository
     # this field goes to a simple SQL batch loader
     vote
  }
}

You end up seeing traces that look like:

screenshot 2016-10-28 08 54 09

Note in the above that the start time of all the fields is more or less the same (as the executor runs over the set of entries, and they are all added to the various loaders), and the total time is "cumulative". In actuality the vote loader is more or less instantaneous (running against a local sqlite db in this case), and a correct start time for those fields should really be at the end of the repository loader (so I guess 2.03ms in this SS).


This is why I think for proper instrumentation of the batch loader I think we need to know two things:

  1. What time a loader actually starts.
  2. Which fields the loader is running for.

The first part is trivial I suppose, but the second seems tricky.

@dylanahsmith
Copy link
Contributor

Yes, that code example tracks the start and end of resolving a field, not the time it spends actually executing those fields, so no other integration is needed with graphql-ruby. You would see a similar problem any time fields are able to resolved completely concurrently, such as from thread/connection pools, locking or CPU intensive tasks in a parallel executor.

The problem really is about instrumenting batch loaders.

a correct start time for those fields should really be at the end of the repository loader

Although the resolving is mostly happening in the batch loader, the resolver could still take significant time. I don't think this fact should be ignored, since the resolver might end up doing an unbatched load or non-trivial computation before doing the batch load, so assuming the start of execution happens in the loader would be misleading in a very important case.

If you actually want to visualize the batch execution, then you will need to make sure that you can collect more than just the start and end time for resolving a field.

The other thing that will complicate the instrumentation that you didn't mention is that GraphQL::Batch::Promise callbacks get called immediately after they are fulfilled, so a callback that does non-trivial computation or an unbatched query could make it look like that time was spent in the batch loader itself.

  1. What time a loader actually starts.

One possibility here is to allow the application to configure the executor so they can mixin instrumenation into it. I think this would also be useful to support parallel or concurrent execution.

Another would be to have a subscription and notification system that would allow code to round around the batch load, which would allow it to get the start and end time.

  1. Which fields the loader is running for.

Ideally I would like to move the async execution strategy support into graphql-ruby and to decouple the batch loaders from graphql, since I could see batch loading being useful outside of graphql (e.g. batch loading data accessed during liquid rendering). As such, I don't plan on having the batch loaders track these fields, but I do think it should allow this to be done.

However, by having a way to run code around field resolving and batch loading, it would already be possible to keep track of the context in which a batch load is requested. So I think the only thing missing would be instrumentation for when GraphQL::Batch::Loader#load is called.

I was already planning on making it so that GraphQL::Batch::Loader#load would make sure the loader was registered with the executor, so that the GraphQL::Batch::Loader could be used to cache loads across a GraphQL request. That means this instrumentation could also be handled by the executor.

@tmeasday
Copy link
Author

tmeasday commented Nov 2, 2016

You would see a similar problem any time fields are able to resolved completely concurrently, such as from thread/connection pools, locking or CPU intensive tasks in a parallel executor.

I assume you mean "any time fields not are able to be resolved..."?


Although the resolving is mostly happening in the batch loader, the resolver could still take significant time. I don't think this fact should be ignored ...

You're right of course. I think a current line of thought is to measure times for field [start, return, io_start, resolve] (where io_start is the batch start time here). This would capture the "non-trivial computation before doing the batch load" in the start-return phase.

Although thinking about this more (and considering your post-batch processing time point), we may need to get even more nuanced if we want to be completely correct here. It's certainly conceivable that a field may end up executing over multiple non-contiguous blocks of time before resolving in some execution models. I'm not sure if we want to fudge this fact or not.

Ideally I would like to move the async execution strategy support into graphql-ruby and to decouple the batch loaders from graphql..

This makes a lot of sense to me also.

@tmeasday
Copy link
Author

tmeasday commented Nov 2, 2016

Some expanded current thinking of data to collect: https://github.com/apollostack/optics-agent/blob/pcarrier/proposals/timing/proposals/timing.md

@tmeasday
Copy link
Author

tmeasday commented Dec 2, 2016

@dylanahsmith looking at this again, and working through the new lazy_resolve feature in graphql (rmosolgo/graphql-ruby#429 (comment)), I think ultimately all I need is some way via graphql-batch to get a list of all the promises that were resolved together.

Then when the first promise in the batch calls .sync I can record the time, and when already fulfilled promises do the same I can can find the sibling that kicked off the loader.

Am I making sense here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants