Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better projection support for Parquet SMB reads #5303

Open
clairemcginty opened this issue Mar 14, 2024 · 0 comments
Open

Better projection support for Parquet SMB reads #5303

clairemcginty opened this issue Mar 14, 2024 · 0 comments
Labels
enhancement New feature or request parquet

Comments

@clairemcginty
Copy link
Contributor

parquet-avro supports Schema projections that exclude required fields. However, if a required field is excluded, the Avro record will fail Coder roundtrip during the next PTransform.

As a workaround in scio-parquet, we provide a custom map API that's applied immediately to the Parquet record before it undergoes Coder serialization, so that you can map the record to a serializable type; and in SortMergeTransform, you can do this with a custom via() function.

However, there's no support for such a projection function for regular Parquet SMB CoGroups/GroupByKeys. We'd have to add support for a SerializableFunction inside MultiSourceKeyGroupReader.

@clairemcginty clairemcginty added enhancement New feature or request parquet labels Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parquet
Projects
None yet
Development

No branches or pull requests

1 participant