Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 597, require encoding kwarg in open call (and other calls that delegate to io.open) #3826

Closed
graingert opened this issue Sep 10, 2020 · 14 comments · Fixed by #4753
Closed
Labels
Enhancement ✨ Improvement to a component Good first issue Friendly and approachable by new contributors Help wanted 🙏 Outside help would be appreciated, good for new contributors

Comments

@graingert
Copy link
Contributor

Is your feature request related to a problem? Please describe

there's a proposal to deprecate not passing an encoding to open(...)
https://www.python.org/dev/peps/pep-0597/#motivation

Developers using macOS or Linux may forget that the default encoding is not always UTF-8.

For example, long_description = open("README.md").read() in setup.py is a common mistake. Many Windows users can not install the package if there is at least one non-ASCII character (e.g. emoji) in the README.md file which is encoded in UTF-8.

For example, 489 packages of the 4000 most downloaded packages from PyPI used non-ASCII characters in README. And 82 packages of them can not be installed from source package when locale encoding is ASCII. [1] They used the default encoding to read README or TOML file.

Another example is logging.basicConfig(filename="log.txt"). Some users expect UTF-8 is used by default, but locale encoding is used actually. [2]

Even Python experts assume that default encoding is UTF-8. It creates bugs that happen only on Windows. See [3] and [4].

Raising a warning when the encoding option is omitted will help to find such mistakes.

Describe the solution you'd like

raise a warning similar to subprocess-run-check

Additional context

Add any other context about the feature request here.

@graingert
Copy link
Contributor Author

# bad
with open(filename) as f:
    ...


# bad
with open(filename, encoding=None) as f:
    ...


# good
with open(filename, encoding="utf8", errors="surrogateescape") as f:
    ...

# good
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
    ...

@hippo91
Copy link
Contributor

hippo91 commented Sep 10, 2020

@graingert thanks!
It makes sense.

@hippo91 hippo91 added Enhancement ✨ Improvement to a component Help wanted 🙏 Outside help would be appreciated, good for new contributors labels Sep 10, 2020
@graingert
Copy link
Contributor Author

graingert commented Sep 10, 2020

@hippo91 can you allocate a pylint feature id for me?

@hippo91
Copy link
Contributor

hippo91 commented Sep 14, 2020

@graingert sorry for the delay. What do you mean by feature id?

@graingert
Copy link
Contributor Author

I think I mean a message id? http://pylint.pycqa.org/en/latest/technical_reference/features.html

@hippo91
Copy link
Contributor

hippo91 commented Sep 15, 2020

@graingert i think a new message id is necessary for this case. Something around missing_open_encoding.
Maybe @Pierre-Sassoulas @AWhetter or @PCManticore will have a better idea?
subprocess-run-check should be indeed a good starting point.

@graingert
Copy link
Contributor Author

Note that it's not just open, eg os.fdopen

@Pierre-Sassoulas
Copy link
Member

I really like this one, I had the problem multiple time for windows/mac users, that would be really helpful. I'd also use an error code for that one. Regarding the message id, what about unspecified-encoding ?

@hippo91
Copy link
Contributor

hippo91 commented Sep 19, 2020

@Pierre-Sassoulas let's go for unspecified-encoding.

@Pierre-Sassoulas Pierre-Sassoulas added the Good first issue Friendly and approachable by new contributors label Mar 2, 2021
DanielNoord added a commit to DanielNoord/pylint that referenced this issue Jul 26, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes pylint-dev#3826
DanielNoord referenced this issue in DanielNoord/pylint Jul 27, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes PyCQA#3826
DanielNoord added a commit to DanielNoord/pylint that referenced this issue Jul 27, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes pylint-dev#3826
DanielNoord added a commit to DanielNoord/pylint that referenced this issue Jul 28, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes pylint-dev#3826
DanielNoord added a commit to DanielNoord/pylint that referenced this issue Jul 28, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes pylint-dev#3826
DanielNoord added a commit to DanielNoord/pylint that referenced this issue Jul 28, 2021
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes pylint-dev#3826
Pierre-Sassoulas pushed a commit that referenced this issue Jul 28, 2021
* Add unspecified-encoding checker #3826
This adds an unspecified-encoding checker that adds a warning
whenever open() is called without an explicit encoding argument.
This closes #3826

* Update tests to conform to unspecified-encoding
With addition of the unspecified-encoding checker calls of open()
need an encoding argument.
Where necessary this argument has been added, or the message has been
disabled.
This also includes small linting changes to a small number
of tests. Their test-data has been updated to reflect new line numbers.

* Update scripts to conform to unspecified-encoding
With addition of the unspecified-encoding checker calls of open()
need an encoding argument.
Where necessary this argument has been added.

* Update pylint to conform to unspecified-encoding
With addition of the unspecified-encoding checker calls of open()
need an encoding argument.
Where necessary this argument has been added.

Co-authored-by: Marc Mueller <30130371+cdce8p@users.noreply.github.com>
Co-authored-by: Marc Mueller <30130371+cdce8p@users.noreply.github.com>
MVrachev added a commit to MVrachev/tuf that referenced this issue Aug 23, 2021
Read more about the warning here:
pylint-dev/pylint#3826

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this issue Aug 23, 2021
A new warning appeared from pylint when calling "tox -e lint" on the
"develop" branch with id "unspecified-encoding",
I read about the warning and it make sense.

Read more about the warning here:
pylint-dev/pylint#3826

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@anders-kiaer
Copy link

Should also pathlib.Path.read_text in the standard library give warnings, or is it out of scope?

With pylint==2.10.2 the snippet

import pathlib

pathlib.Path("/some/file").read_text()

gives no unspecified-encoding warning.

@graingert
Copy link
Contributor Author

Should also pathlib.Path.read_text in the standard library give warnings, or is it out of scope?

With pylint==2.10.2 the snippet

import pathlib

pathlib.Path("/some/file").read_text()

gives no unspecified-encoding warning.

I forget that one the most

@Pierre-Sassoulas
Copy link
Member

I think you're right, I never used it myself so I did not think of adding it. Do you want to do the change @graingert ? :)

@anders-kiaer
Copy link

You probably see it/have seen it already: For pathlib completeness, as far as I can see there are three functions wrapping open in pathlib (at least there are three hits when I search for encoding on the page https://docs.python.org/3/library/pathlib.html):

@graingert
Copy link
Contributor Author

Sorry I should have mentioned it in the issue, read_text was where I first noticed the problem because I assumed that was utf8 but knew open wasn't!

MVrachev added a commit to MVrachev/tuf that referenced this issue Aug 24, 2021
A new warning appeared from pylint when calling "tox -e lint" on the
"develop" branch with id "unspecified-encoding",
I read about the warning and it make sense.

Read more about the warning here:
pylint-dev/pylint#3826

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
mnm678 pushed a commit to mnm678/tuf that referenced this issue Aug 24, 2021
A new warning appeared from pylint when calling "tox -e lint" on the
"develop" branch with id "unspecified-encoding",
I read about the warning and it make sense.

Read more about the warning here:
pylint-dev/pylint#3826

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
jeanralphaviles added a commit to jeanralphaviles/prometheus_speedtest that referenced this issue Aug 29, 2021
[PEP 597](https://www.python.org/dev/peps/pep-0597/) adds an additional warning
when using open() without specifying an explicit encoding. This was added to
pylint in pylint-dev/pylint#3826 and eventually broke our
build as a new warning.
return42 added a commit to searxng/searxng that referenced this issue Aug 31, 2021
Pylint 2.10 added new default checks [1]:

unspecified-encoding:
  Emitted when open() is called without specifying an encoding [2]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
[2] pylint-dev/pylint#3826

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
stanislavlevin added a commit to stanislavlevin/freeipa-healthcheck that referenced this issue Feb 15, 2022
Pylint 2.10 introduced new checker unspecified-encoding:
> Added unspecified-encoding: Emitted when open() is called without
  specifying an encoding

https://pylint.pycqa.org/en/latest/whatsnew/changelog.html#what-s-new-in-pylint-2-10-0
pylint-dev/pylint#3826
https://www.python.org/dev/peps/pep-0597/

Fixes: freeipa#244
Signed-off-by: Stanislav Levin <slev@altlinux.org>
stanislavlevin added a commit to stanislavlevin/fc-admin that referenced this issue Feb 17, 2022
Pylint 2.10 introduced new checker unspecified-encoding:
> Added unspecified-encoding: Emitted when open() is called without
  specifying an encoding

See pylint-dev/pylint#3826
See https://www.python.org/dev/peps/pep-0597/

Fixes: fleet-commander#279
Signed-off-by: Stanislav Levin <slev@altlinux.org>
stanislavlevin added a commit to stanislavlevin/fc-admin that referenced this issue Feb 17, 2022
Pylint 2.10 introduced new checker unspecified-encoding:
> Added unspecified-encoding: Emitted when open() is called without
  specifying an encoding

See pylint-dev/pylint#3826
See https://www.python.org/dev/peps/pep-0597/

Fixes: fleet-commander#279
Signed-off-by: Stanislav Levin <slev@altlinux.org>
olivergs pushed a commit to fleet-commander/fc-admin that referenced this issue Mar 3, 2022
Pylint 2.10 introduced new checker unspecified-encoding:
> Added unspecified-encoding: Emitted when open() is called without
  specifying an encoding

See pylint-dev/pylint#3826
See https://www.python.org/dev/peps/pep-0597/

Fixes: #279
Signed-off-by: Stanislav Levin <slev@altlinux.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Good first issue Friendly and approachable by new contributors Help wanted 🙏 Outside help would be appreciated, good for new contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants