Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot-Level Metrics and Statistics #102

Open
omervk opened this issue Nov 13, 2018 · 0 comments
Open

Snapshot-Level Metrics and Statistics #102

omervk opened this issue Nov 13, 2018 · 0 comments

Comments

@omervk
Copy link
Contributor

omervk commented Nov 13, 2018

Assume a table with the following field:

id int

There are n data files, each of which has the statistics min(id) and max(id). ids are positive integers.

Querying by id < 0 would require an O(n) run on all data files in the manifest, querying whether min(id) < 0 < max(id).

Aggregating metrics and/or statistics to the Snapshot level would reduce such scans from O(n) (n being the number of data files) to O(1).

Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/iceberg that referenced this issue Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant