Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow BoundedExecutor to recover from failures #742

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

electrum
Copy link
Member

Previously submitted tasks will eventually run if the underlying
executor recovers and additional tasks are submitted.

Previously submitted tasks will eventually run if the underlying
executor recovers and additional tasks are submitted.
try {
coreExecutor.execute(this::drainQueue);
}
catch (Throwable e) {
failed.set(true);
log.error("BoundedExecutor state corrupted due to underlying executor failure");
decrementAndGetQueueSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if coreExecutor.execute(this::drainQueue) throws immediately, you can end up having 1 task in queue and queueSize=0.

To me, it looks like you want something like "current thread count" instead of "queue size". Question is how to do this without losing an incoming task.

catch (Throwable e) {
log.error(e, "Task failed");

if ((decrementAndGetQueueSize() == 0) && (task == null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary parenthesis

log.error(e, "Task failed");

if ((decrementAndGetQueueSize() == 0) && (task == null)) {
return;
Copy link
Contributor

@findepi findepi Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With maxThreads=1 consider

  1. thread A calls execute
    a. adds task
    b. queueSize := 1
    c. starts a thread A'
  2. thread A' removes the task from the queue and executes it
  3. thread A' calls decrementAndGetQueueSize, i.e. queueSize := 0, but does not exit because task was not null
  4. thread B (or A) calls execute
    a. adds task
    b. queueSize := 1
    c. starts a thread B', because queueSize <= maxThreads
  5. thread C calls execute
    a. adds task
    b. queueSize := 2
    c. doesn't start a thread

At this point, 2 threads A' and B' are executing within coreExecutor even though maxThreads is 1 and both will attempt to poll & execute next task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(updated)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we could rearrange the if condition to task == null && decrementAndGetQueueSize() == 0 then threadB' won't be created as queueSize := 0 only if the task == null or maybe we can peek into the queue to check for the next task
decrementAndGetQueueSize == 0 && queue.peek() == null

@erichwang
Copy link
Member

erichwang commented Jul 15, 2021

@electrum: to achieve the desired behavior, we really only have 3 possible options here:

  1. Make the bounded executor fully synchronized for all task submissions (only allowing one submission at a time) -- but this might add unwanted overhead in high performance cases since none of the existing code or dependencies use any locks.
  2. Add a scheduled background thread pool whose job is to keep retrying failed task submission retries forever until success
  3. Consume the calling thread as one of the draining tasks in the event of thread pool failure -- while technically this wouldn't corrupt the data structure, this would be terrible for callers and so this isn't really a desirable path.

There is no way to make this magically work without one of these three options. For me, I would suggest (2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants