Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Validate time_zone in pl.Datetime #16165

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

luke396
Copy link
Contributor

@luke396 luke396 commented May 11, 2024

closes #16038

Before

pl.Datetime(time_zone="invalid_time_zone")
# Datetime(time_unit='us', time_zone='invalid_time_zone')

pl.Datetime(time_zone=object())
# Datetime(time_unit='us', time_zone=<object object at 0x7f245331d0d0>)

pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='cabbage'))
# ComputeError: unable to parse time zone: 'cabbage'. Please check the Time Zone Database for a list of available time zones

After

pl.Datetime(time_zone="invalid_time_zone")
# ValueError: invalid time zone: 'invalid_time_zone', to see valid strings run `import zoneinfo; zoneinfo.available_timezones()`

pl.Datetime(time_zone=object())
# ValueError: invalid time zone: <object object at 0x7f830861fb30>, to see valid strings run `import zoneinfo; zoneinfo.available_timezones()`

pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='tst'))
# ValueError: invalid time zone: 'tst', to see valid strings run `import zoneinfo; zoneinfo.available_timezones()`

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels May 11, 2024
Copy link

codecov bot commented May 11, 2024

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 80.98%. Comparing base (0b66308) to head (18b43b4).

Files Patch % Lines
py-polars/polars/datatypes/classes.py 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16165      +/-   ##
==========================================
- Coverage   80.99%   80.98%   -0.01%     
==========================================
  Files        1387     1387              
  Lines      178832   178842      +10     
  Branches     2877     2880       +3     
==========================================
+ Hits       144839   144841       +2     
- Misses      33500    33506       +6     
- Partials      493      495       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I left some comments. Would be nice if @MarcoGorelli could also take a look at this.

Comment on lines 498 to 500
else:
msg = "install polars[timezone] to handle datetimes with time zone information"
raise ImportError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should skip validation in this case, rather than raise an error. I believe working with time zones only requires the zoneinfo package in some specific cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice, I have removed the error.

from polars.dependencies import _ZONEINFO_AVAILABLE, zoneinfo

if _ZONEINFO_AVAILABLE:
if time_zone in zoneinfo.available_timezones():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match with the list of timezones we use on the Rust side? What are the differences if there are any?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that concerns me a little, I'm pretty sure I encountered some differences

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having worked on this

To be honest I'm a bit hesitant about it - it's already impossible (counterexamples welcome, I might be missing something) to create a column with an invalid time zone

Not sure there's an advantage to adding extra code to make the validation happen earlier, especially if it doesn't quite match how the validation is done at the Rust level

@luke396
Copy link
Contributor Author

luke396 commented May 14, 2024

Not sure there's an advantage to adding extra code to make the validation happen earlier, especially if it doesn't quite match how the validation is done at the Rust level

Actually, when I was writing the code, I was concerned about whether it was okay to add extra validation in DataType and what regressions it might produce.

I'm not sure if you noticed this issue comment. What are your thoughts on it?

If we decide to leave the check for later, should we add some notes in the user guide or the method's documentation to clarify that this is what we intend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl.Datetime time_zone parameter has no type or value check
3 participants