-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of while-loop version of parse_hms #436
Conversation
Also a couple flake8 fixups
dateutil/parser.py
Outdated
@@ -507,8 +507,9 @@ def resolve_ymd(self, yearfirst, dayfirst): | |||
year, day, month = self | |||
|
|||
else: | |||
split_tzstr = _timelex.split(self.tzstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just moved out here to comply with PEP8?
If so, I think it's better to leave it as is, because this has a presumably minor but real performance impact, since in the current version, the _timelex.split(self.tzstr)
call happens in a short-circuit branch and is not eagerly calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was just a PEP8 thing. Happy to revert it.
dateutil/parser.py
Outdated
res.hour = int(value) | ||
if value % 1: | ||
res.minute = int(60 * (value % 1)) | ||
elif (1 < idx == len(tokens)-1 and tokens[idx-1] == ' ' and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want this to be len_l-1
instead of len(tokens)-1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch.
dateutil/parser.py
Outdated
(i + 1 >= len_l or | ||
(l[i + 1] != ':' and | ||
info.hms(l[i + 1]) is None))): | ||
hms_idx = _find_hms_idx(i, l, info, allow_jump=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the allow_jump=True
when it's not used elsewhere without allow_jump
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was an option used in an earlier implementation and can now be removed.
In the loop-based implementation being replaced, only the first step of this check is allowed to look past a space for a h/m/s label. This implementation should make that irrelevant.
dateutil/parser.py
Outdated
(l[i + 1] != ':' and | ||
info.hms(l[i + 1]) is None))): | ||
hms_idx = _find_hms_idx(i, l, info, allow_jump=True) | ||
if hms_idx is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand the reasoning - why was the _find_hms_idx
call moved to the beginning here?
Is it because you need to get and store the results of _find_hms_idx
within the if
call, so you've moved it to the front?
I'm not entirely sure how I feel about this. It may not be hurting anything, but I don't understand the reasoning of the original loop well enough to tell why they are ordered the way they are. At the very least, it seems possible that this will induce some performance hit in some important formats like ISO-8601.
This feels horribly un-pythonic to me, but one possible work-around for "need to check and store the result" would be some sort of mutable object modified in the function, e.g.:
class MutableReturn(object):
def __init__(self, val):
self.return_value = val
def f(x, mr):
rv = x - 7
if rv > 2:
rv = None
mr.return_value = rv
return rv
if __name__ == "__main__":
mr = MutableReturn(None)
for v in [8, 10]:
if f(v, mr) is not None:
print('Value was {}'.format(mr.return_value))
Another option that is effectively the same thing but may be saner in this case is to have _find_hms_idx
cache its last return value as a mutable property, e.g.:
def f(x):
rv = x - 7
if rv > 2:
rv = None
f.last_return = rv
return rv
if __name__ == "__main__":
mr = MutableReturn(None)
for v in [8, 10]:
if f(v) is not None:
print('Value was {}'.format(f.last_return))
I think the second one's a bit more elegant, but the first one is thread safe since it doesn't manipulate any globals.
Of course, we'll make _find_hms_idx
a member function of parser
once this PR is wrapped up (or as part of it if you are making changes anyway), so we could always use the argument that you can get thread safety by spawning a new parser
object for each thread - though that's kinda thin to avoid creating a mutable return object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because you need to get and store the results of _find_hms_idx within the if call, so you've moved it to the front?
Yes. I don't especially care about the ordering, but I do really like having all of these cases be elif
s. This is easy to move back to its original location. I think the much simpler solution is to be willing to call _find_hms_idx
twice. It can be moved back down to after the elif len_li in (8, 12, 14):
block and changed to:
if _find_hms_idx(i, l, info, allow_jump=True) is not None:
hms_idx = _find_hms_idx(i, l, info, allow_jump=True)
[...]
This is a pretty cheap function call. Let's at least do some profiling before worrying about caching (much less mutability and thread-safety).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed two commits. One of them adds a lightweight testenv for parser (with profiling). The other addresses the comments/suggestions above. |
I'm not overly familiar with I could see it being very useful to have module-specific testenvs set up for rapid development, but I'd think half the value of having lighter testenvs would be so you can quickly run them on all the supported environments, but this just uses Is there anything specific about Either way, maybe we can do that as a separate PR. |
That's all this is, yes. Removing "parser" from the "envlist" will leave it available under "tox -e parser" without having it run with just "tox".
Sure. |
Most important is moving the find_hms_idx block back to its previous position
303b303
to
1656f90
Compare
This is a rebased version of (most of) #425.
Also a couple flake8 fixups.
Some pieces need docstrings.