Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of while-loop version of parse_hms #436

Merged
merged 3 commits into from
Aug 6, 2017

Conversation

jbrockmendel
Copy link
Contributor

This is a rebased version of (most of) #425.

Also a couple flake8 fixups.

Some pieces need docstrings.

Also a couple flake8 fixups
@@ -507,8 +507,9 @@ def resolve_ymd(self, yearfirst, dayfirst):
year, day, month = self

else:
split_tzstr = _timelex.split(self.tzstr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just moved out here to comply with PEP8?

If so, I think it's better to leave it as is, because this has a presumably minor but real performance impact, since in the current version, the _timelex.split(self.tzstr) call happens in a short-circuit branch and is not eagerly calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was just a PEP8 thing. Happy to revert it.

res.hour = int(value)
if value % 1:
res.minute = int(60 * (value % 1))
elif (1 < idx == len(tokens)-1 and tokens[idx-1] == ' ' and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want this to be len_l-1 instead of len(tokens)-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

(i + 1 >= len_l or
(l[i + 1] != ':' and
info.hms(l[i + 1]) is None))):
hms_idx = _find_hms_idx(i, l, info, allow_jump=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the allow_jump=True when it's not used elsewhere without allow_jump?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was an option used in an earlier implementation and can now be removed.

In the loop-based implementation being replaced, only the first step of this check is allowed to look past a space for a h/m/s label. This implementation should make that irrelevant.

(l[i + 1] != ':' and
info.hms(l[i + 1]) is None))):
hms_idx = _find_hms_idx(i, l, info, allow_jump=True)
if hms_idx is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand the reasoning - why was the _find_hms_idx call moved to the beginning here?

Is it because you need to get and store the results of _find_hms_idx within the if call, so you've moved it to the front?

I'm not entirely sure how I feel about this. It may not be hurting anything, but I don't understand the reasoning of the original loop well enough to tell why they are ordered the way they are. At the very least, it seems possible that this will induce some performance hit in some important formats like ISO-8601.

This feels horribly un-pythonic to me, but one possible work-around for "need to check and store the result" would be some sort of mutable object modified in the function, e.g.:

class MutableReturn(object):
    def __init__(self, val):
        self.return_value = val

def f(x, mr):
    rv = x - 7

    if rv > 2:
        rv = None

    mr.return_value = rv

    return rv

if __name__ == "__main__":
    mr = MutableReturn(None)

    for v in [8, 10]:
        if f(v, mr) is not None:
            print('Value was {}'.format(mr.return_value))

Another option that is effectively the same thing but may be saner in this case is to have _find_hms_idx cache its last return value as a mutable property, e.g.:

def f(x):
    rv = x - 7

    if rv > 2:
        rv = None

    f.last_return = rv
    return rv

if __name__ == "__main__":
    mr = MutableReturn(None)

    for v in [8, 10]:
        if f(v) is not None:
            print('Value was {}'.format(f.last_return))

I think the second one's a bit more elegant, but the first one is thread safe since it doesn't manipulate any globals.

Of course, we'll make _find_hms_idx a member function of parser once this PR is wrapped up (or as part of it if you are making changes anyway), so we could always use the argument that you can get thread safety by spawning a new parser object for each thread - though that's kinda thin to avoid creating a mutable return object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because you need to get and store the results of _find_hms_idx within the if call, so you've moved it to the front?

Yes. I don't especially care about the ordering, but I do really like having all of these cases be elifs. This is easy to move back to its original location. I think the much simpler solution is to be willing to call _find_hms_idx twice. It can be moved back down to after the elif len_li in (8, 12, 14): block and changed to:

if _find_hms_idx(i, l, info, allow_jump=True) is not None:
    hms_idx = _find_hms_idx(i, l, info, allow_jump=True)
    [...]

This is a pretty cheap function call. Let's at least do some profiling before worrying about caching (much less mutability and thread-safety).

Copy link
Member

@pganssle pganssle Aug 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm fine with calling it twice.

Honestly, I think if we implement #125 and add format caching, we'll see some dramatic speedup in nearly all real-world use-cases, so little stuff won't really "add up" in any real sense.

I think we should definitely implement #125 before the 2.7.0 release.

@pganssle pganssle added the style label Aug 6, 2017
@pganssle pganssle added this to the 2.7.0 milestone Aug 6, 2017
@jbrockmendel
Copy link
Contributor Author

Just pushed two commits. One of them adds a lightweight testenv for parser (with profiling). The other addresses the comments/suggestions above.

@pganssle
Copy link
Member

pganssle commented Aug 6, 2017

I'm not overly familiar with tox - it seems like the parser testenv runs automatically when you run tox, but the only difference is that it runs a subset of the tests, with coverage and profiling. Is there a way to make it so that it's separate from (or maybe even on top of) the standard tox run?

I could see it being very useful to have module-specific testenvs set up for rapid development, but I'd think half the value of having lighter testenvs would be so you can quickly run them on all the supported environments, but this just uses python.

Is there anything specific about parser that justifies having a separate lighter test environment just for that? If not, it might be worth making a separate, generic helper script that you can run like tox subset parser to get one for parser, tox subset relativedelta for relativedelta dev, etc.

Either way, maybe we can do that as a separate PR.

@jbrockmendel
Copy link
Contributor Author

I could see it being very useful to have module-specific testenvs set up for rapid development

That's all this is, yes.

Removing "parser" from the "envlist" will leave it available under "tox -e parser" without having it run with just "tox".

Either way, maybe we can do that as a separate PR.

Sure.

Most important is moving the find_hms_idx block back to its previous position
@pganssle pganssle merged commit 874434d into dateutil:master Aug 6, 2017
@pganssle pganssle mentioned this pull request Mar 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants