Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to configure thread-pool while using Iceberg to read the data (plan files/tasks) #10335

Open
osscm opened this issue May 14, 2024 · 3 comments
Labels
improvement PR that improves existing functionality

Comments

@osscm
Copy link

osscm commented May 14, 2024

Feature Request / Improvement

Right now there is only one static Thread Pool, which Iceberg library uses internally.
It picks a number based on the num of cores (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L42)

From the engines like Trino and Spark, there is no way to control this.
Users uses Trino to access multiple catalogs, so it becomes much more important from Trino to have this support.

The, other problem with this approach is, that Engines using Iceberg will not have any control and trace of memory used by these Threads, and can it be directly proportional to the threads used.

When Trino is running many concurrent queries, this can unearth a unstable Coordinator.

private static final ExecutorService WORKER_POOL = MoreExecutors.getExitingExecutorService(

related issues:
trinodb/trino#11920
trinodb/trino#11708

Query engine

None

@osscm osscm added the improvement PR that improves existing functionality label May 14, 2024
@amogh-jahagirdar
Copy link
Contributor

Checking, I thought we already had an API which allowed users to pass in a custom thread pool during planning? If not, I think that makes sense to add.

@amogh-jahagirdar
Copy link
Contributor

yeah we have a planWith API on Scans already and Trino is already using that when generating the splits here https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java#L91.

@osscm Do you mind elaborating what kind of configuration you had in mind? It seems to me that with a custom executor (whose size is controlled by the engine) a user could control the things you were talking about like memory consumption. Or if you see that we're not leveraging that threadpool in certain cases, when we should be but that's another problem

@osscm
Copy link
Author

osscm commented May 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants