Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate-size predicate partitioning #64

Open
EnricoMi opened this issue Nov 16, 2020 · 0 comments
Open

Predicate-size predicate partitioning #64

EnricoMi opened this issue Nov 16, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@EnricoMi
Copy link
Collaborator

The zero service provides predicate size statistics. A predicate partitioner could bin predicates by size, trying to achieve equal-size predicate partitions. This would be useful to partition long-tail schemata where we do not want a fixed number of predicates in partitions or all predicates in a singleton partition. The goal could be to have partitions with size of the largest predicate or a multiple of it.

The result partitioning can be combined with a uid partitioning to increase parallelism if needed.

A binning algorithm could desending-sort predicates by size and put the next predicate into the partition with the smallest amount of remaining space that can still accommodate it. If there is no such partition, create a new one. Hence maximum partition size must be greater or equal to the largest predicate. With a long-tail schema, smaller predicates will fill up the remaining space. They may be a small number of partitions with many tail-predicates, though.

@EnricoMi EnricoMi added the enhancement New feature or request label Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant