-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Glue metastore deadlock when getting statistics #21954
Fix Glue metastore deadlock when getting statistics #21954
Conversation
4931ddd
to
e6446fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
import static java.util.Collections.nCopies; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class HiveExecutorUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add a new utility method for this. We can extract this later when we need to reuse the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find utility class better isolated and thus easier to reason about from algorithmic perspective
i will make this package private and move to glue package though
// process ready results to fail fast on exceptions | ||
for (Future<TaskResult<T>> ready = completionService.poll(); ready != null; ready = completionService.poll()) { | ||
TaskResult<T> taskResult = ready.get(); | ||
if (taskResult != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here explaining that null
means that the task was completed inline so the result is already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added // Null result means task was processed by the calling thread
} | ||
catch (InterruptedException e) { | ||
Thread.currentThread().interrupt(); | ||
futures.forEach(future -> future.cancel(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to a finally block to ensure the futures are always cancled, so we don't do any unnecessary work (for example when we throw ExecutionException above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
d59c4eb
to
b61ccf5
Compare
Glue metastore uses `runParallel` utility to parallelize getting statistics across partitions, but also for one partition to parallelize getting stats for column batches. The underlying executor is fixed size, to this may lead to a dead-lock. This commit avoids the dead-lock by work stealing, i.e. asking the calling thread to participate in the work.
b61ccf5
to
1f9f747
Compare
Glue metastore uses
runParallel
utility to parallelize getting statistics across partitions, but also for one partition to parallelize getting stats for column batches. The underlying executor is fixed size, to this may lead to a dead-lock. This commit avoids the dead-lock by work stealing, i.e. asking the calling thread to participate in the work.