Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154

bigdrum · 2016-07-24T04:06:53Z

I have been considering how to implement the idea of facebook/dataloader with graphql in go (which address issues like #106, #132), and this PR shows the progress so far.

This PR is mainly a demonstration on how it could work. And so far I'm happy with the idea.

This requires some minimum changes to the graphql package (just to inject a executor with a simple interface). The real concurrency feature is implemented in the godataloader library that I wrote separately, which can be plugged into graphql.

examples/dataloader contains an actual example, which demonstrate the following features:

Opt-in.
Concurrently loading multiple data.
Load duplicate requests once.
Batching multiple requests.
No unnecessary goroutines are spawn unless concurrency actually happen.
Synchronous programming style (without Promise-like object).

The test might best demonstrate the behavior.

func TestQuery(t *testing.T) {
    schema := dataloaderexample.CreateSchema()
    r := dataloaderexample.RunQuery(`{
        p1_0: post(id: "1") { id author { name }}
        p1_1: post(id: "1") { id author { name }}
        p1_2: post(id: "1") { id author { name }}
        p1_3: post(id: "1") { id author { name }}
        p1_4: post(id: "1") { id author { name }}
        p1_5: post(id: "1") { id author { name }}
        p2_1: post(id: "2") { id author { name }}
        p2_2: post(id: "2") { id author { name }}
        p2_3: post(id: "2") { id author { name }}
        p3_1: post(id: "3") { id author { name }}
        p3_2: post(id: "3") { id author { name }}
        p3_3: post(id: "3") { id author { name }}
        u1_1: user(id: "1") { name }
        u1_2: user(id: "1") { name }
        u1_3: user(id: "1") { name }
        u2_1: user(id: "3") { name }
        u2_2: user(id: "3") { name }
        u2_3: user(id: "3") { name }
    }`, schema)
    if len(r.Errors) != 0 {
        t.Error(r.Errors)
    }
    t.Error(r)
    // The above query would produce log like this:
    // 2016/07/23 23:49:31 Load post 3
    // 2016/07/23 23:49:31 Load post 1
    // 2016/07/23 23:49:31 Load post 2
    // 2016/07/23 23:49:32 Batch load users [3 1 2]
    // Notice the first level post loading is done concurrently without duplicate.
    // The user loading is also done in the same fashion, but batched fetch is used instead.
    // TODO: Make test actually verify the logged behavior.
}

This PR is not meant to be merged yet (at least the example probably doesn't fit to be part of this project because of its external dependency, unless the godataloader library is moved).

The godataloader library is probably a little bit more complicated than one would expect. And I'm happy to explain it if anyone finds this idea interesting.

…ems. And an example to demonstrate how it can be used.

bsr203 · 2016-07-24T14:55:08Z

It may be a while till I expiriment with it, but like how you done it all with minimal changes and also github.com/bigdrum/godataloader is kinda small. thanks for improving this library. cheers.

coveralls · 2016-07-24T19:58:38Z

Coverage increased (+0.05%) to 90.807% when pulling b7944de on bigdrum:executor into 491504a on graphql-go:master.

coveralls · 2016-07-24T20:10:00Z

Coverage increased (+0.05%) to 90.807% when pulling fd2a5d6 on bigdrum:executor into 491504a on graphql-go:master.

coveralls · 2016-07-24T21:42:24Z

Coverage increased (+0.04%) to 90.797% when pulling d514566 on bigdrum:executor into 491504a on graphql-go:master.

coveralls · 2016-07-24T23:28:11Z

Coverage increased (+0.04%) to 90.797% when pulling 61de32e on bigdrum:executor into 491504a on graphql-go:master.

teloon · 2016-09-15T04:20:54Z

This look really interesting. Anything we're missing here?

bigdrum · 2016-10-20T13:23:29Z

FYI, I've been using this at a small scale production environment. And I haven't seen any bugs so far. The only problem that I still consider this as experiment is that for a object with only trivial resolve function (one that without expensive IO), the dataloader would still kick off a goroutine (one per parent object, not one per item, though, so not that many). We don't see any performance issue so far but it would be nice to provide some hint to optimize that away.

ajackson-cpi · 2016-11-09T17:31:03Z

Any status on this? I'm interested in parallel processing for my use-cases. GoRoutines are quite light if they're short-lived, so I wouldn't spend too much time on them.

tonyghita · 2017-03-03T23:58:20Z

Have you looked at https://github.com/nicksrandall/dataloader? This library should probably be orthogonal to any dataloader

bigdrum · 2017-03-04T01:26:19Z

https://github.com/nicksrandall/dataloader only dispatches the batch when batch size reaches some limit or some pre-configured amount of wait time has passed. This will introduce unnecessary delay. On the other hand, facebook's dataloader.js and my implementation do not have such issue.

tonyghita · 2017-03-06T18:38:13Z

@bigdrum this is because facebook's implementation takes advantage of the javascript event loop. Golang (to my understanding) doesn't have such an event loop, so @nicksrandall's implementation approximates this with the batch size/time elapsed mechanism (both parameters are configurable to your use case).

bigdrum · 2017-03-07T00:56:23Z

Right my implementation of data loader implements a custom scheduler to achieve the similar effect.

…

On Mon, Mar 6, 2017, 1:38 PM Tony Ghita ***@***.***> wrote: @bigdrum <https://github.com/bigdrum> this is because facebook's implementation takes advantage of the javascript event loop. Golang (to my understanding) doesn't have such an event loop, so @nicksrandall <https://github.com/nicksrandall>'s implementation approximates this with the batch size/time elapsed mechanism (both parameters are configurable to your use case). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#154 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACytkzUBGP71jOReYH6o74WOfHDR70Iks5rjFKWgaJpZM4JTf5k> .

bigdrum · 2017-03-07T01:16:07Z

To be more elaborated, in essence, there are two kinds of tasks, the tasks that collect what need to be loaded, and the tasks of the fetching. If we can prioritize the collection task over the fetching task, we can batch as much as possible.

FB's dataloader.js takes advantage of js env where a closure can be schedule at the end of the event loop, which essentially allow them to schedule the lower priority fetch tasks after the collection task. So event loop is not essence here, the essence is the ability to dispatch tasks in a certain order. The event loop and its API just allows dataloader.js to implement the custom scheduling logic, in a way that hides perfectly from the end user.

Go doesn't support custom go routine scheduling, if it does, we could create a schedule domain, dispatch go routine with different strict priority. So I ended up implement a custom scheduling mechanism that allows running different closure with a custom order, which has two queues internally, to allow executing the tasks with two tier priority. Thus to achieve the same result. But I could not figure out a perfectly non-intrusive way as dataloader.js can do in node.js.

@nicksrandall's implementation achieve this by manually delay the fetch task with a fixed amount of time, to allow the collection tasks to be executed before that. But since it doesn't know when the collection is done, it pays the cost of waiting blindly.

It is a trade-off, @nicksrandall is definitely much easier to understand, and my implementation is harder, but to me, the precise scheduling without compromising the latency is exactly what I really want and inspired from FB's dataloader.js. Without which, I might just use node.js for graphql implementation. I'm presenting a way to achieve the same effect even though the underlying runtime is so different between node.js and go.

In the end, what I'm proposing here is just a interface to allow such different execution model to be injected. People can choose different actual executor as they see fits their needs.

(Sorry for potential grammar errors as I'm not a English native speaker.)

ajackson-cpi · 2017-03-08T04:52:31Z

I'm not sure where to look, but this scenario sounds like it needs a sync.WaitGroup{} that sees all the .Add() calls before the collector tasks start, and something does .Wait() before starting the fetchers. The collector tasks can call .Done()

That's the scatter-gather pattern in GoLang.

nicksrandall · 2017-03-08T14:27:49Z

@bigdrum I like your approach of creating a custom scheduler. I know this a first draft but a few features that I think are critical for a datoader are max batch size and max time out. I worry that your implementation under load could grow unbounded. That feature should be pretty easy to add.

bigdrum · 2017-03-08T14:53:32Z

@nicksrandall Right that's very good suggestion, and my approach does introduce more footprint comparing to yours as it creates new goroutines when it sees the need to yield to another graphql branch to collect more tasks (but note at a given moment only one goroutine is active, all others are waiting to be scheduled).

The interesting thing is that this logic could be put into the custom scheduler. For example, we could provide some fairness to the low priority queue, like if the high priority queue size reaches some limit, dispatch the task in low priority queue. This allows the pending batch to be flushed. Same for the time based criteria.

Since the dataloader/scheduler is per graphql query. We could also introduce a shared semaphore state so that the total pending tasks of a server could not exceed some given limit.

ajackson-cpi · 2017-06-26T23:38:57Z

I'm interested in seeing this merged too. What's the status?

sandeepone · 2017-10-24T11:51:56Z

Any Update? 👍

divoxx · 2018-06-20T21:49:08Z

Any updates on this?

shengwu · 2018-07-13T00:59:48Z

Any updates on this vs parallel resolution?

chris-ramon · 2018-09-10T16:03:04Z

Thanks a lot guys! 👍 — Closing this one in favor of #388, you might want to take a look to a real-use case working example that shows the support for concurrent resolvers.

Allow user customizable executor to resolve object fields and list it…

2d5ce70

…ems. And an example to demonstrate how it can be used.

bigdrum added 2 commits July 24, 2016 14:30

Batch user loading.

916f8c4

Fix test.

b7944de

Minor code improvement on handling of the undefined keys.

fd2a5d6

We should just use lock, no need the premature optimization.

d514566

Some special case optmization.

61de32e

andreas mentioned this pull request Dec 20, 2017

Opt-in parallel resolving of fields and lists sogko/graphql#23

Open

3 tasks

scottjg mentioned this pull request May 19, 2018

Merge the bigdrum/graphql pluggable executor branch into our fork everyteam/graphql#2

Merged

chris-ramon mentioned this pull request Jul 20, 2018

Concurrently resolve fields #132

Closed

chris-ramon mentioned this pull request Sep 2, 2018

[RFC] Concurrent Resolvers #389

Closed

chris-ramon closed this Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154

Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154

bigdrum commented Jul 24, 2016

bsr203 commented Jul 24, 2016

coveralls commented Jul 24, 2016 •

edited

coveralls commented Jul 24, 2016

coveralls commented Jul 24, 2016

coveralls commented Jul 24, 2016 •

edited

teloon commented Sep 15, 2016

bigdrum commented Oct 20, 2016

ajackson-cpi commented Nov 9, 2016

tonyghita commented Mar 3, 2017

bigdrum commented Mar 4, 2017

tonyghita commented Mar 6, 2017

bigdrum commented Mar 7, 2017 via email

bigdrum commented Mar 7, 2017 •

edited

ajackson-cpi commented Mar 8, 2017

nicksrandall commented Mar 8, 2017

bigdrum commented Mar 8, 2017 •

edited

ajackson-cpi commented Jun 26, 2017

sandeepone commented Oct 24, 2017

divoxx commented Jun 20, 2018

shengwu commented Jul 13, 2018

chris-ramon commented Sep 10, 2018

Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154

Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154

Conversation

bigdrum commented Jul 24, 2016

bsr203 commented Jul 24, 2016

coveralls commented Jul 24, 2016 • edited

coveralls commented Jul 24, 2016

coveralls commented Jul 24, 2016

coveralls commented Jul 24, 2016 • edited

teloon commented Sep 15, 2016

bigdrum commented Oct 20, 2016

ajackson-cpi commented Nov 9, 2016

tonyghita commented Mar 3, 2017

bigdrum commented Mar 4, 2017

tonyghita commented Mar 6, 2017

bigdrum commented Mar 7, 2017 via email

bigdrum commented Mar 7, 2017 • edited

ajackson-cpi commented Mar 8, 2017

nicksrandall commented Mar 8, 2017

bigdrum commented Mar 8, 2017 • edited

ajackson-cpi commented Jun 26, 2017

sandeepone commented Oct 24, 2017

divoxx commented Jun 20, 2018

shengwu commented Jul 13, 2018

chris-ramon commented Sep 10, 2018

coveralls commented Jul 24, 2016 •

edited

coveralls commented Jul 24, 2016 •

edited

bigdrum commented Mar 7, 2017 •

edited

bigdrum commented Mar 8, 2017 •

edited