Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - add priority race #1472

Closed

Conversation

sandinmyjoints
Copy link

@sandinmyjoints sandinmyjoints commented Sep 16, 2017

This solves a similar problem to #1064. For this solution, you can't arbitrarily stop the loop (to have annotate tasks ahead of time), but otoh, it's not a breaking change.

@sandinmyjoints sandinmyjoints changed the title Feature priority race Feature - add priority race Sep 16, 2017
@sandinmyjoints
Copy link
Author

From what I can tell, the Travis failure (for Node 7 only) is unrelated to my changes, maybe a spurious build error?

@megawac megawac self-requested a review September 19, 2017 06:50
Copy link
Collaborator

@megawac megawac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some reservations about how this is implemented. First I'm not sure if .isPrioritized = bool is generic enough. I would prefer .priority = int. Also don't really like the assigning of the priorities to the functions.

As for the new method itself I haven't really ever seen a use case for it. Mind providing a couple of examples?

}
for (var i = 0; i <= 5; i++) {
tasks[i] = eachTest(i);
tasks[0].priority = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this getting set for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To show that when more than one task has priority, the async operation will finish if/when all of them finish, not just one of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant that this should be isPrioritized

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, this was left over from an earlier implementation. Fixed.

@sandinmyjoints
Copy link
Author

@megawac Thanks for reviewing. Great questions. Here's the use case that motivates this. We have two db tables, A and B. Data from A is preferred -- we always use it if available. But not every query has data in A. Some queries have data in B, and if A is not available, we'll use the data from B.

The simplest thing to do would be:

  • query A
  • if there's a result, return it
  • if no result, query B
  • if there's a result, return it, else return no result

That's fine, but we're trying to optimize for speed. Priority race would allow us to kick off queries to A and B concurrently, return as soon as a result is found in A if one is found, else return when either B has a result or neither has a result. This speeds up two of three cases:

case time - simple way time - priority race
A has result time of query A time of query A
A has no result, B has result time of query A + time of query B time of the longer of A or B
neither has result time of query A + time of query B time of the longer of A or B

So that's the motivation.

Regarding .priority = int, the way I was thinking about things, there are only two classes of tasks: those that cause the entire async operation to finish early vs. those that do not. Hence a boolean. I'm open to something else if you think it makes more sense. To work on an implementation, I'd like to understand what you have in mind. What behavior would you expect with these three tasks:

  • task A has priority 0
  • task B has priority 1
  • task C has priority 2

For these orderings of when each task finishes:

  • A, B, C
  • A, C, B
  • B, A, C
  • B, C, A
  • C, A, B
  • C, B, A

Thanks!

@sandinmyjoints
Copy link
Author

Also, I agree that assigning a property to functions is a bit odd, certainly uncommon. An alternative is to have some way inside of tasks to signal that the entire async operation should finish before all tasks are done. I read over #1064 and it seemed to have a lot of debate about how to accomplish this, so then assigning priority to tasks started to look acceptable -- it sidesteps the question of how tasks can stop the async operation.

@hargasinski
Copy link
Collaborator

@sandinmyjoints could this use case be solved by async.parallel and #1064 as opposed to adding a new method? i.e. something along the lines of

async.parallel({
  tableA: (callback) => {
    getDataFromTableA((err, data) => {
      if (err || !data) {
        return callback(err);
      }
      // finish early, don't wait for tableB to finish
      return callback(false, data);
    });
  },
  tableB: (callback) => {
    getDataFromTableB((err, data) => {
      if (err) {
        return callback(err);
      }
      return callback(null, data);
    });
  }
}, (err, results) => {
  if (err) {
    // error handling code
  }

  let data = results.tableA || results.tableB;
  // handle the data
});

This has the same performance benefits as priorityRace:

case time - simple way parallel + breaking early
A has result time of query A time of query A (doesn't wait for B)
A has no result, B has result time of query A + time of query B time of the longer of A or B
neither has result time of query A + time of query B time of the longer of A or B

For an int priority, I would expect priorityRace to wait for the highest priority to finish. If the highest priority task doesn't return data, it would wait for the next highest priority to finish or if it has already finished, check if it returned data. If it returns/returned data, priorityRace would finish. Otherwise, it would continue checking the lower priorities.

@sandinmyjoints
Copy link
Author

@hargasinski I agree that that approach accomplishes the same outcome. My thinking was that calling back with false for the first argument is a bit of a workaround -- I know people do it, but it goes against the node convention of the first argument being an error or null, and it's not mentioned in the async docs anywhere as a way to accomplish this purpose. My intention with this PR was to make it a first-class use case. Might be personal taste, but it seems like less of a code smell to me to say "set this property on a task to make it short-circuit" than "call back with false as the first argument to short-circuit". 😃

@aearly
Copy link
Collaborator

aearly commented Oct 1, 2017

I feel this method is a bit too niche to add to Async. It also feels like the control flow can be accomplished with parallel and an early return, or a combination of parallel and race.

I also think the discussion about #1064 is a bit of a red herring. There's nothing preventing you from calling an outer callback early, but currently, doing so causes a subtle memory leak since the subsequent callbacks wait forever to be called. The proposed callback(false) solution is about signaling that a flow was exited, and no further callbacks will be called, closing the memory leak. (It actually would make control flow like this better! But this control flow is not a solution to that particular problem.)

@sandinmyjoints
Copy link
Author

OK, thanks for the thoughts and feedback @aearly @hargasinski @megawac. I'll use an alternative approach for my use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants