Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Glue metastore deadlock when getting statistics #21954

Conversation

findepi
Copy link
Member

@findepi findepi commented May 13, 2024

Glue metastore uses runParallel utility to parallelize getting statistics across partitions, but also for one partition to parallelize getting stats for column batches. The underlying executor is fixed size, to this may lead to a dead-lock. This commit avoids the dead-lock by work stealing, i.e. asking the calling thread to participate in the work.

@cla-bot cla-bot bot added the cla-signed label May 13, 2024
@github-actions github-actions bot added the hive Hive connector label May 13, 2024
@findepi findepi added RELEASE-BLOCKER and removed hive Hive connector labels May 13, 2024
@findepi findepi force-pushed the findepi/fix-glue-metastore-deadlock-when-getting-statistics-dc0613 branch from 4931ddd to e6446fa Compare May 13, 2024 16:27
@cla-bot cla-bot bot added the cla-signed label May 13, 2024
@github-actions github-actions bot added the hive Hive connector label May 13, 2024
@findepi
Copy link
Member Author

findepi commented May 13, 2024

Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

import static java.util.Collections.nCopies;
import static java.util.Objects.requireNonNull;

public final class HiveExecutorUtil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add a new utility method for this. We can extract this later when we need to reuse the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find utility class better isolated and thus easier to reason about from algorithmic perspective
i will make this package private and move to glue package though

// process ready results to fail fast on exceptions
for (Future<TaskResult<T>> ready = completionService.poll(); ready != null; ready = completionService.poll()) {
TaskResult<T> taskResult = ready.get();
if (taskResult != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here explaining that null means that the task was completed inline so the result is already set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added // Null result means task was processed by the calling thread

}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
futures.forEach(future -> future.cancel(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a finally block to ensure the futures are always cancled, so we don't do any unnecessary work (for example when we throw ExecutionException above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

@findepi findepi force-pushed the findepi/fix-glue-metastore-deadlock-when-getting-statistics-dc0613 branch 2 times, most recently from d59c4eb to b61ccf5 Compare May 14, 2024 08:40
Glue metastore uses `runParallel` utility to parallelize getting
statistics across partitions, but also for one partition to parallelize
getting stats for column batches. The underlying executor is fixed size,
to this may lead to a dead-lock. This commit avoids the dead-lock by
work stealing, i.e. asking the calling thread to participate in the work.
@findepi findepi force-pushed the findepi/fix-glue-metastore-deadlock-when-getting-statistics-dc0613 branch from b61ccf5 to 1f9f747 Compare May 14, 2024 10:43
@findepi findepi merged commit b65b649 into master May 14, 2024
66 checks passed
@findepi findepi deleted the findepi/fix-glue-metastore-deadlock-when-getting-statistics-dc0613 branch May 14, 2024 15:40
@github-actions github-actions bot added this to the 448 milestone May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants