Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Specifying Partitioning Function for External Mappings #100

Open
omervk opened this issue Nov 13, 2018 · 1 comment
Open

Allow Specifying Partitioning Function for External Mappings #100

omervk opened this issue Nov 13, 2018 · 1 comment

Comments

@omervk
Copy link
Contributor

omervk commented Nov 13, 2018

(this is dependent upon the completion of #71 and #72)

The partition function for external mappings is derived from the parsing of the path of data files a-la Hive's format.

For instance the structure:

/date=2018-11-12/file.avsc
/date=2018-11-13/file.avsc

Would create a new column date with with string values 2018-11-12 and 2018-11-13 and assume the partitioning function is identity(date) instead of being able to derive it from another field (i.e. a function of the date part of a timestamp column).

Iceberg should let users specify their own partitioning function, based on existing columns.

@rdblue
Copy link
Contributor

rdblue commented Nov 16, 2018

I think what you're trying to accomplish would be done a little differently. I understand the term "partitioning function" to mean the partition transformations that are part of a partition spec.

That's not the right place to do this because we don't need to add extra representations of a date to the manifest files. Instead, a process importing files from an external source should parse the strings and produce the right data value (day ordinal from 1970-01-01=0) for the date. Then Iceberg would use the same partition code for these files.

Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/iceberg that referenced this issue Apr 12, 2019
* Fix StructLikeWrapper equals and hashCode null handling.
* Spark: Fix reading null partition values.
* Add test for null partition values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants