-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-8492: Read Parquet Microsecond Columns as Bigint #2907
base: master
Are you sure you want to change the base?
DRILL-8492: Read Parquet Microsecond Columns as Bigint #2907
Conversation
…ParquetSimpleTestFileGenerator
…osecond columns are read as 64-bit values or timestamp values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1 Thanks for the contribution!
Can you please update the documentation for the Parquet reader to include this? Otherwise looks good!
Happy to contribute! Do you mean the documentation in the |
That's the one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. I can backport this too because you've left default behaviour unchanged (and it's self contained). My only question is about ParquetReaderConfig -- you've added switches to the global config system so I don't think we also need them to appear in the storage plugin config?
Thanks. The reason I added the configuration properties to However, I could very well have overlooked some way to access the global configuration, and I'd be grateful for any pointers. |
We should existing find examples in the Parquet format plugin. E.g. the "old" reader is affected by the option store.parquet.reader.pagereader.async. Follow up: I see use of the FragmentContext class for accessing config options in the old Parquet reader, perhaps it's a good a vehicle... |
|
It's always bugged me that we don't have a global way of accessing at least one of DrillbitContext, QueryContext, FragmentContext or just OptionManager. We hardly want to have to spray these things through APIs everywhere in Drill. I'll take a look at whether something can be done... |
DRILL-8492: Read parquet microsecond columns as bigint
Description
Two new configuration options,
store.parquet.reader.time_micros_as_int64
andstore.parquet.reader.timestamp_micros_as_int64,
have been added.When reading Parquet columns of type
time_micros
andtimestamp_micros
, the returned value will be the original 64-bit integer value instead of a timestamp value truncated to milliseconds if the configuration option for the column type istrue
.The implementation closely follows how the existing option
store.parquet.reader.int96_as_timestamp
is handled.Both new options have the default value
false
to preserve existing behaviour as default.Documentation
The new options can be set in "drill-module.conf", and also be set through the Web UI's Options page.
Testing
Unit tests have been added to
org.apache.drill.exec.store.parquet.TestMicrosecondColumns
.It could be worth noting that microsecond columns can be compared to time or timestamp literals even if the option for reading the column's value as a 64-bit value is true:
SELECT * FROM file.parquet WHERE TO_TIME(time_micros_column/1000) > '09:32:58.174';
and
SELECT * FROM file.parquet WHERE TO_TIMESTAMP(timestamp_micros_column/1000000) > '2024-04-26 13:17:41.421';