Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dateutil.parse.parse() rejects bytearray as input #417

Closed
uckelman opened this issue Jul 14, 2017 · 3 comments
Closed

dateutil.parse.parse() rejects bytearray as input #417

uckelman opened this issue Jul 14, 2017 · 3 comments

Comments

@uckelman
Copy link

>>> import dateutil.parser
>>> dateutil.parser.parse(b'17 Jan 2017')
datetime.datetime(2017, 1, 17, 0, 0)
>>> dateutil.parser.parse(bytearray(b'17 Jan 2017'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/site-packages/dateutil/parser.py", line 1168, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/python3.5/site-packages/dateutil/parser.py", line 556, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/usr/lib/python3.5/site-packages/dateutil/parser.py", line 675, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens
  File "/usr/lib/python3.5/site-packages/dateutil/parser.py", line 192, in split
    return list(cls(s))
  File "/usr/lib/python3.5/site-packages/dateutil/parser.py", line 61, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))
TypeError: Parser must be a string or character stream, not bytearray

Any function which can operate on a bytes ought to be able to operate on a bytearray---unless something has specifically been done to prevent that, as in dateutil.parser.parse(). When bytes works, it's unexpected that bytearray would fail. It would be better to accept bytearrays as well.

@uckelman uckelman changed the title dateutil.parse.parse() rejects bytearray as input dateutil.parse.parse() rejects bytearray as input Jul 14, 2017
@uckelman
Copy link
Author

A one line change to datetime.parser._timelex.__init__() would fix this:

if isinstance(instream, (binary_type, bytearray)):

instead of

if isinstance(instream, binary_type):

However, letting duck typing work would be even better:

        if not hasattr(instream, 'read'):
            if hasattr(instream, 'decode'):
                instream = instream.decode()
            
            if isinstance(instream, text_type):
                instream = StringIO(instream)

            if not hasattr(instream, 'read'):
                raise TypeError('Parser must be a string or character stream, not '
                                '{itype}'.format(itype=instream.__class__.__name__))

@pganssle pganssle added this to the 2.7.0 milestone Jul 16, 2017
@pganssle
Copy link
Member

I wouldn't say that support for bytesarray is particularly important here, but I am generally in favor of duck typing. I think that when we did the mypy typing for typeshed, we decided that the supported inputs would be bytes, str and "anything with a .read()".

Given that we're already calling decode(), I think probably replacing bytes with "anything with a decode() that returns either str or something with a read()" is reasonable. enough. I suspect the main problem with this is that it may cause problems if people provide bytes and bytesarrays encoded in something other than UTF-8, but that's already a problem (and one that's trivially solved by the fact that the first-class support is actually for str objects anyway), so I don't think it should be a blocker.

If you (or anyone coming across this issue) would like to make a PR, I'd go with the duck typed version, plus add a test where a bytesarray is parsed. Otherwise I'll do it eventually. Thanks for the report and taking the time to work out a solution, @uckelman

@uckelman
Copy link
Author

uckelman commented Jul 17, 2017

The duck typing solution is in PR #418, along with two tests. Thanks for the quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants