Feature - add priority race #1472

sandinmyjoints · 2017-09-16T18:52:20Z

This solves a similar problem to #1064. For this solution, you can't arbitrarily stop the loop (to have annotate tasks ahead of time), but otoh, it's not a breaking change.

sandinmyjoints · 2017-09-16T21:53:07Z

From what I can tell, the Travis failure (for Node 7 only) is unrelated to my changes, maybe a spurious build error?

megawac

I have some reservations about how this is implemented. First I'm not sure if .isPrioritized = bool is generic enough. I would prefer .priority = int. Also don't really like the assigning of the priorities to the functions.

As for the new method itself I haven't really ever seen a use case for it. Mind providing a couple of examples?

megawac · 2017-09-19T06:53:50Z

mocha_test/priorityRace.js

+        }
+        for (var i = 0; i <= 5; i++) {
+            tasks[i] = eachTest(i);
+            tasks[0].priority = true;


What is this getting set for?

To show that when more than one task has priority, the async operation will finish if/when all of them finish, not just one of them.

Sorry I meant that this should be isPrioritized

You're absolutely right, this was left over from an earlier implementation. Fixed.

sandinmyjoints · 2017-09-21T03:49:31Z

@megawac Thanks for reviewing. Great questions. Here's the use case that motivates this. We have two db tables, A and B. Data from A is preferred -- we always use it if available. But not every query has data in A. Some queries have data in B, and if A is not available, we'll use the data from B.

The simplest thing to do would be:

query A
if there's a result, return it
if no result, query B
if there's a result, return it, else return no result

That's fine, but we're trying to optimize for speed. Priority race would allow us to kick off queries to A and B concurrently, return as soon as a result is found in A if one is found, else return when either B has a result or neither has a result. This speeds up two of three cases:

case	time - simple way	time - priority race
A has result	time of query A	time of query A
A has no result, B has result	time of query A + time of query B	time of the longer of A or B
neither has result	time of query A + time of query B	time of the longer of A or B

So that's the motivation.

Regarding .priority = int, the way I was thinking about things, there are only two classes of tasks: those that cause the entire async operation to finish early vs. those that do not. Hence a boolean. I'm open to something else if you think it makes more sense. To work on an implementation, I'd like to understand what you have in mind. What behavior would you expect with these three tasks:

task A has priority 0
task B has priority 1
task C has priority 2

For these orderings of when each task finishes:

A, B, C
A, C, B
B, A, C
B, C, A
C, A, B
C, B, A

Thanks!

sandinmyjoints · 2017-09-21T03:54:08Z

Also, I agree that assigning a property to functions is a bit odd, certainly uncommon. An alternative is to have some way inside of tasks to signal that the entire async operation should finish before all tasks are done. I read over #1064 and it seemed to have a lot of debate about how to accomplish this, so then assigning priority to tasks started to look acceptable -- it sidesteps the question of how tasks can stop the async operation.

hargasinski · 2017-09-21T18:40:01Z

@sandinmyjoints could this use case be solved by async.parallel and #1064 as opposed to adding a new method? i.e. something along the lines of

async.parallel({
  tableA: (callback) => {
    getDataFromTableA((err, data) => {
      if (err || !data) {
        return callback(err);
      }
      // finish early, don't wait for tableB to finish
      return callback(false, data);
    });
  },
  tableB: (callback) => {
    getDataFromTableB((err, data) => {
      if (err) {
        return callback(err);
      }
      return callback(null, data);
    });
  }
}, (err, results) => {
  if (err) {
    // error handling code
  }

  let data = results.tableA || results.tableB;
  // handle the data
});

This has the same performance benefits as priorityRace:

case	time - simple way	parallel + breaking early
A has result	time of query A	time of query A (doesn't wait for B)
A has no result, B has result	time of query A + time of query B	time of the longer of A or B
neither has result	time of query A + time of query B	time of the longer of A or B

For an int priority, I would expect priorityRace to wait for the highest priority to finish. If the highest priority task doesn't return data, it would wait for the next highest priority to finish or if it has already finished, check if it returned data. If it returns/returned data, priorityRace would finish. Otherwise, it would continue checking the lower priorities.

sandinmyjoints · 2017-09-22T18:58:33Z

@hargasinski I agree that that approach accomplishes the same outcome. My thinking was that calling back with false for the first argument is a bit of a workaround -- I know people do it, but it goes against the node convention of the first argument being an error or null, and it's not mentioned in the async docs anywhere as a way to accomplish this purpose. My intention with this PR was to make it a first-class use case. Might be personal taste, but it seems like less of a code smell to me to say "set this property on a task to make it short-circuit" than "call back with false as the first argument to short-circuit". 😃

aearly · 2017-10-01T21:05:32Z

I feel this method is a bit too niche to add to Async. It also feels like the control flow can be accomplished with parallel and an early return, or a combination of parallel and race.

I also think the discussion about #1064 is a bit of a red herring. There's nothing preventing you from calling an outer callback early, but currently, doing so causes a subtle memory leak since the subsequent callbacks wait forever to be called. The proposed callback(false) solution is about signaling that a flow was exited, and no further callbacks will be called, closing the memory leak. (It actually would make control flow like this better! But this control flow is not a solution to that particular problem.)

sandinmyjoints · 2017-10-05T00:14:13Z

OK, thanks for the thoughts and feedback @aearly @hargasinski @megawac. I'll use an alternative approach for my use case.

sandinmyjoints added 2 commits September 16, 2017 11:40

Update test name.

5b56afb

Add priorityRace.

dc091ef

sandinmyjoints changed the title ~~Feature priority race~~ Feature - add priority race Sep 16, 2017

megawac self-requested a review September 19, 2017 06:50

megawac requested changes Sep 19, 2017

View reviewed changes

Fix test.

e6dba05

sandinmyjoints closed this Oct 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature - add priority race #1472

Feature - add priority race #1472

sandinmyjoints commented Sep 16, 2017 •

edited

sandinmyjoints commented Sep 16, 2017

megawac left a comment

megawac Sep 19, 2017

sandinmyjoints Sep 21, 2017

megawac Sep 21, 2017

sandinmyjoints Sep 22, 2017

sandinmyjoints commented Sep 21, 2017

sandinmyjoints commented Sep 21, 2017

hargasinski commented Sep 21, 2017

sandinmyjoints commented Sep 22, 2017

aearly commented Oct 1, 2017

sandinmyjoints commented Oct 5, 2017

Feature - add priority race #1472

Feature - add priority race #1472

Conversation

sandinmyjoints commented Sep 16, 2017 • edited

sandinmyjoints commented Sep 16, 2017

megawac left a comment

Choose a reason for hiding this comment

megawac Sep 19, 2017

Choose a reason for hiding this comment

sandinmyjoints Sep 21, 2017

Choose a reason for hiding this comment

megawac Sep 21, 2017

Choose a reason for hiding this comment

sandinmyjoints Sep 22, 2017

Choose a reason for hiding this comment

sandinmyjoints commented Sep 21, 2017

sandinmyjoints commented Sep 21, 2017

hargasinski commented Sep 21, 2017

sandinmyjoints commented Sep 22, 2017

aearly commented Oct 1, 2017

sandinmyjoints commented Oct 5, 2017

sandinmyjoints commented Sep 16, 2017 •

edited