Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version and Specifier accept (erroneously) some non-ASCII letters in the *local version* segment #469

Open
zuo opened this issue Oct 22, 2021 · 2 comments · May be fixed by #470
Open

Version and Specifier accept (erroneously) some non-ASCII letters in the *local version* segment #469

zuo opened this issue Oct 22, 2021 · 2 comments · May be fixed by #470

Comments

@zuo
Copy link

zuo commented Oct 22, 2021

Reproducing the behavior concerning packaging.version.Version:

Python 3.9.7 (default, Oct  4 2021, 18:09:29) 
[...]
>>> import packaging.version
>>> packaging.version.Version('1.2+\u0130\u0131\u017f\u212a')
<Version('1.2+i̇ıſk')>

The cause is that packaging.version.VERSION_PATTERN makes use of a-z character ranges in conjunction with re.IGNORECASE and (implicit in Python 3.x) re.UNICODE (see the 2nd paragraph of this fragment: https://docs.python.org/3/library/re.html#re.IGNORECASE).

It can be fixed in one of the following two ways:

  • either by adding re.ASCII to flags (but then both occurrences of \s* in the actual regex will be restricted to match ASCII-only whitespace characters!);
  • or by removing re.IGNORECASE from flags and replacing (in VERSION_PATTERN) both occurrences of a-z with A-Za-z plus adding suitable upper-case alternatives in the pre_l, post_l and dev_l regex groups, e.g., [aA][lL][pP][hH][aA] in place of alpha (quite cumbersome...).
@uranusjr
Copy link
Member

The whitespace issue can probably be worked around by doing that detection separately from the actual parsing.

class Version:
    _regex = re.compile(VERSION_PATTERN, re.VERBOSE | re.IGNORECASE | re.ASCII)

    def __init__(self, version: str) -> None:
        match = self._regex.match(version.strip())
        # ... The rest is the same ...

@zuo zuo changed the title Version accepts (erroneously) some non-ASCII letters in the local part Version and Specifier accept (erroneously) some non-ASCII letters in the *local version* segment Oct 23, 2021
@zuo
Copy link
Author

zuo commented Oct 23, 2021

Reproducing the behavior concerning packaging.specifiers.Specifier:

Python 3.9.7 (default, Oct  4 2021, 18:09:29) 
[...]
>>> import packaging.specifiers
>>> packaging.specifiers.Specifier('==1.2+\u0130\u0131\u017f\u212a')
<Specifier('==1.2+İıſK')>

The cause is the same as the aforementioned Version-related one, except that here it relates to (non-public) Specifier._regex_str and Specifier._regex regular expression definitions.

In the case of these regexes, \s occurs in various places of the effective pattern (not only at its start and end), so here the solution based on re.ASCII in conjunction with the .strip()-based value preparation (proposed above by @uranusjr in regard to Version) cannot be applied without restricting matching of white space characters (including the negated character range in the case of the === operator...) to ASCII-only ones; which -- I suppose -- would be too disruptive.

Instead of that, an additional check can be performed when the operator is not '===' -- something along the lines of:

without_whitespace = ''.join(spec.split())
if not without_whitespace.isascii():
    raise InvalidSpecifier...

zuo added a commit to zuo/packaging that referenced this issue Oct 23, 2021
* Fix validation in the packaging.version.Version's constructor

* Fix validation in the packaging.specifiers.Specifier's constructor

* Fix docs of packaging.version.VERSION_PATTERN by mentioning necessity
  of the re.ASCII flag
@zuo zuo linked a pull request Oct 23, 2021 that will close this issue
zuo added a commit to zuo/packaging that referenced this issue Oct 24, 2021
* Fix validation in the packaging.version.Version's constructor

* Fix validation in the packaging.specifiers.Specifier's constructor

* Fix docs of packaging.version.VERSION_PATTERN by mentioning necessity
  of the re.ASCII flag
zuo added a commit to zuo/packaging that referenced this issue Oct 26, 2021
* Fix validation in the packaging.version.Version's constructor

* Fix validation in the packaging.specifiers.Specifier's constructor

* Fix docs of packaging.version.VERSION_PATTERN by mentioning necessity
  of the re.ASCII flag
zuo added a commit to zuo/packaging that referenced this issue Oct 26, 2021
* Fix validation in the packaging.version.Version's constructor

* Fix validation in the packaging.specifiers.Specifier's constructor

* Complement relevant tests, including adding cases related to presence
  of non-ASCII whitespace characters in input strings

* Fix docs of packaging.version.VERSION_PATTERN by mentioning necessity
  of the re.ASCII flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants