Get rid of while-loop version of parse_hms #436

jbrockmendel · 2017-08-05T04:33:03Z

This is a rebased version of (most of) #425.

Also a couple flake8 fixups.

Some pieces need docstrings.

Also a couple flake8 fixups

pganssle · 2017-08-06T16:49:54Z

dateutil/parser.py

@@ -507,8 +507,9 @@ def resolve_ymd(self, yearfirst, dayfirst):
                    year, day, month = self

            else:
+                split_tzstr = _timelex.split(self.tzstr)


Is this just moved out here to comply with PEP8?

If so, I think it's better to leave it as is, because this has a presumably minor but real performance impact, since in the current version, the _timelex.split(self.tzstr) call happens in a short-circuit branch and is not eagerly calculated.

Yes, this was just a PEP8 thing. Happy to revert it.

pganssle · 2017-08-06T16:53:02Z

dateutil/parser.py

-                res.hour = int(value)
-                if value % 1:
-                    res.minute = int(60 * (value % 1))
+    elif (1 < idx == len(tokens)-1 and tokens[idx-1] == ' ' and


Did you want this to be len_l-1 instead of len(tokens)-1?

Yes, good catch.

pganssle · 2017-08-06T16:54:29Z

dateutil/parser.py

-                        (i + 1 >= len_l or
-                         (l[i + 1] != ':' and
-                          info.hms(l[i + 1]) is None))):
+                    hms_idx = _find_hms_idx(i, l, info, allow_jump=True)


Why the allow_jump=True when it's not used elsewhere without allow_jump?

I think that was an option used in an earlier implementation and can now be removed.

In the loop-based implementation being replaced, only the first step of this check is allowed to look past a space for a h/m/s label. This implementation should make that irrelevant.

pganssle · 2017-08-06T17:24:13Z

dateutil/parser.py

-                         (l[i + 1] != ':' and
-                          info.hms(l[i + 1]) is None))):
+                    hms_idx = _find_hms_idx(i, l, info, allow_jump=True)
+                    if hms_idx is not None:


Just trying to understand the reasoning - why was the _find_hms_idx call moved to the beginning here?

Is it because you need to get and store the results of _find_hms_idx within the if call, so you've moved it to the front?

I'm not entirely sure how I feel about this. It may not be hurting anything, but I don't understand the reasoning of the original loop well enough to tell why they are ordered the way they are. At the very least, it seems possible that this will induce some performance hit in some important formats like ISO-8601.

This feels horribly un-pythonic to me, but one possible work-around for "need to check and store the result" would be some sort of mutable object modified in the function, e.g.:

class MutableReturn(object): def __init__(self, val): self.return_value = val def f(x, mr): rv = x - 7 if rv > 2: rv = None mr.return_value = rv return rv if __name__ == "__main__": mr = MutableReturn(None) for v in [8, 10]: if f(v, mr) is not None: print('Value was {}'.format(mr.return_value))

Another option that is effectively the same thing but may be saner in this case is to have _find_hms_idx cache its last return value as a mutable property, e.g.:

def f(x): rv = x - 7 if rv > 2: rv = None f.last_return = rv return rv if __name__ == "__main__": mr = MutableReturn(None) for v in [8, 10]: if f(v) is not None: print('Value was {}'.format(f.last_return))

I think the second one's a bit more elegant, but the first one is thread safe since it doesn't manipulate any globals.

Of course, we'll make _find_hms_idx a member function of parser once this PR is wrapped up (or as part of it if you are making changes anyway), so we could always use the argument that you can get thread safety by spawning a new parser object for each thread - though that's kinda thin to avoid creating a mutable return object.

Is it because you need to get and store the results of _find_hms_idx within the if call, so you've moved it to the front?

Yes. I don't especially care about the ordering, but I do really like having all of these cases be elifs. This is easy to move back to its original location. I think the much simpler solution is to be willing to call _find_hms_idx twice. It can be moved back down to after the elif len_li in (8, 12, 14): block and changed to:

if _find_hms_idx(i, l, info, allow_jump=True) is not None: hms_idx = _find_hms_idx(i, l, info, allow_jump=True) [...]

This is a pretty cheap function call. Let's at least do some profiling before worrying about caching (much less mutability and thread-safety).

Yeah, I'm fine with calling it twice.

Honestly, I think if we implement #125 and add format caching, we'll see some dramatic speedup in nearly all real-world use-cases, so little stuff won't really "add up" in any real sense.

I think we should definitely implement #125 before the 2.7.0 release.

jbrockmendel · 2017-08-06T19:14:22Z

Just pushed two commits. One of them adds a lightweight testenv for parser (with profiling). The other addresses the comments/suggestions above.

pganssle · 2017-08-06T20:11:09Z

I'm not overly familiar with tox - it seems like the parser testenv runs automatically when you run tox, but the only difference is that it runs a subset of the tests, with coverage and profiling. Is there a way to make it so that it's separate from (or maybe even on top of) the standard tox run?

I could see it being very useful to have module-specific testenvs set up for rapid development, but I'd think half the value of having lighter testenvs would be so you can quickly run them on all the supported environments, but this just uses python.

Is there anything specific about parser that justifies having a separate lighter test environment just for that? If not, it might be worth making a separate, generic helper script that you can run like tox subset parser to get one for parser, tox subset relativedelta for relativedelta dev, etc.

Either way, maybe we can do that as a separate PR.

jbrockmendel · 2017-08-06T20:26:00Z

I could see it being very useful to have module-specific testenvs set up for rapid development

That's all this is, yes.

Removing "parser" from the "envlist" will leave it available under "tox -e parser" without having it run with just "tox".

Either way, maybe we can do that as a separate PR.

Sure.

Most important is moving the find_hms_idx block back to its previous position

Get rid of while-loop version of parse_hms

0686d8c

Also a couple flake8 fixups

pganssle mentioned this pull request Aug 5, 2017

Separate out "while True" loop for e.g. 12 h00m04s #425

Closed

pganssle requested changes Aug 6, 2017

View reviewed changes

pganssle added the style label Aug 6, 2017

pganssle added this to the 2.7.0 milestone Aug 6, 2017

Fixups suggested by reviewer

1656f90

Most important is moving the find_hms_idx block back to its previous position

jbrockmendel force-pushed the parse_hms2 branch from 303b303 to 1656f90 Compare August 6, 2017 20:31

Merge branch 'master' into parse_hms2

18ddad2

pganssle merged commit 874434d into dateutil:master Aug 6, 2017

pganssle mentioned this pull request Mar 11, 2018

Release 2.7.0 #627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of while-loop version of parse_hms #436

Get rid of while-loop version of parse_hms #436

jbrockmendel commented Aug 5, 2017

pganssle Aug 6, 2017

jbrockmendel Aug 6, 2017

pganssle Aug 6, 2017

jbrockmendel Aug 6, 2017

pganssle Aug 6, 2017

jbrockmendel Aug 6, 2017

pganssle Aug 6, 2017

jbrockmendel Aug 6, 2017

pganssle Aug 6, 2017 •

edited

Loading

jbrockmendel commented Aug 6, 2017

pganssle commented Aug 6, 2017

jbrockmendel commented Aug 6, 2017

Get rid of while-loop version of parse_hms #436

Get rid of while-loop version of parse_hms #436

Conversation

jbrockmendel commented Aug 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pganssle Aug 6, 2017 • edited Loading

Choose a reason for hiding this comment

jbrockmendel commented Aug 6, 2017

pganssle commented Aug 6, 2017

jbrockmendel commented Aug 6, 2017

pganssle Aug 6, 2017 •

edited

Loading