performance of workers limited by downlink bandwidth #20

nponeccop · 2015-12-12T18:58:21Z

Imagine that there is one worker that is heavy. I.e. it consumes much resources so it's not practical to run more than one worker.

In this case it is still beneficial to grab more than one job to fully utilize the connection. E,g. if one job (job_assign packet) is 10kb long, on 10 mbit connection with 25ms latency there should be 10 * 1024 * 1024 * 0.025 / (8 * 10 * 1024) = 4 packets in flight (after rounding up from 3.2)

My proposal is to have another control for job count. maxJobs controls how many jobs are executed concurrently, and maxExtraJobsInFlight (or a shorter name) controls, well, the extra jobs in flight.

So in the example situation mentioned above, we will have maxJobs = 1; maxExtraJobsInFlight = 4

The text was updated successfully, but these errors were encountered:

iarna · 2016-05-12T01:45:41Z

This seems like a reasonable addition to me.

nponeccop · 2016-05-12T02:11:23Z

The code is already there in https://github.com/streamcode9/abraxas/commit/fbb7be1e0f075a7257115432bead5597efe1e6a3 (see this._queue). I'm testing it now with both Abraxas server and upstream Gearman and later I will split it into more reasonable separate changes.

Closes iarna#20

nponeccop · 2016-05-15T21:23:14Z

I cleaned up the changes, and put to nponeccop/master. I added maxQueued option. And as a different change I utilize PRE_SLEEP only when the link is idle, namely when we don't have outstanding GRAB_JOB.

Unfortunately the way it is implemented now, while showing excellent performance over WAN (I got about 3 ms overhead per job on batches of 1000 jobs over a bad 200 ms link), breaks support for multiple servers.

The whole algorithm/protocol looks like this (in pseudocode):

invariant active + grabbing + queued <= max_active + max_queued
invariant active < max_active && queued == 0
invariant active >= 0 && active <= max_active
invariant queued >=0 && queued <= max_queued
invariant grabbing >=0

max_active = ...
max_queued = ...
grabbing = 0
active = 0
queued = 0

run():
    dequeueJob()
    active++
    queued--
grab():
    <- GRAB_JOB
    grabbing++
start():
    <- CAN_DO test_tube
    <- PRE_SLEEP
-> NO_JOB:
    grabbing--
    if grabbing == 0
        <- PRE_SLEEP
-> NOOP:
    while (active + grabbing + queued < max_active + max_queued)
        grab()
-> JOB_ASSIGN:
    grabbing--
    queueJob()
    queued++
    if active < max_active
        run()
-> workComplete:
    active--
    if (queued > 0)
        run()
        assert active == max_active
    grab()
    <- WORK_COMPLETE
-> workData:
    <- WORK_DATA

It's rather complicated already, and with speculative GRAB_JOBs sent to multiple servers it will be even worse.

nponeccop pushed a commit to nponeccop/abraxas that referenced this issue May 15, 2016

Avoid PRE_SLEEP unless idle; queue extra jobs

193696c

Closes iarna#20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance of workers limited by downlink bandwidth #20

performance of workers limited by downlink bandwidth #20

nponeccop commented Dec 12, 2015

iarna commented May 12, 2016

nponeccop commented May 12, 2016 •

edited

nponeccop commented May 15, 2016

performance of workers limited by downlink bandwidth #20

performance of workers limited by downlink bandwidth #20

Comments

nponeccop commented Dec 12, 2015

iarna commented May 12, 2016

nponeccop commented May 12, 2016 • edited

nponeccop commented May 15, 2016

nponeccop commented May 12, 2016 •

edited