Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve array from RecordBatch for a leaf column #5699

Open
viirya opened this issue Apr 29, 2024 · 1 comment
Open

Retrieve array from RecordBatch for a leaf column #5699

viirya opened this issue Apr 29, 2024 · 1 comment
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@viirya
Copy link
Member

viirya commented Apr 29, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While working filter pushdown for iceberg-rs: apache/iceberg-rust#295, I am going to use the APIs like ArrowPredicateFn and RowFilter.

When constructing ArrowPredicateFn for iceberg predicate, we provide a filtering function that takes RecordBatch based on the given projection.

The RecordBatch contains the columns specified in the projection. And we need to access correct column in the batch to evaluate the predicate.

For top-level column, it should be straightforward. But for nested column, seems no way to access the particular array from the RecordBatch.

We only have the projection (i.e., ProjectionMask) which contains indices of leaf columns in the batch.

For example, if the schema has [a, b, c] top columns. b is a struct column with [aa, bb, cc] columns. Give a predicate like cc > 1, and we know the leaf indices of the nested column cc is 3.

Is there API we can use to access the array of cc in the RecordBatch?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@viirya viirya added the enhancement Any new improvement worthy of a entry in the changelog label Apr 29, 2024
@tustvold
Copy link
Contributor

I don't believe we currently have a mechanism for nested projection of RecordBatch but this is something that I think would be generally useful and a worthwhile addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants