Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Report] default encoding issue for other character set #227

Closed
caomingpei opened this issue May 9, 2024 · 2 comments · Fixed by #228
Closed

[Report] default encoding issue for other character set #227

caomingpei opened this issue May 9, 2024 · 2 comments · Fixed by #228
Labels
enhancement New feature or request

Comments

@caomingpei
Copy link
Contributor

caomingpei commented May 9, 2024

I am a developer from China. And My develop environment is Win11. When I trying using mkdocs and this macros-plugin, I find a issue. The command reports that the encoding= 'cp936'. First, I think it may be the defalut set by the CMD or PowerShell, but I change it and this doesn't work. Then I check the report, and find the following message (Other message is omitted for simplify):

C:\anaconda3\envs\mkdoc\Lib\site-packages\mkdocs_macros\plugin.py:352 in _load_yaml              │
│                                                                                                  │
│   349 │   │   │   │   with open(filename) as f:                                                  │
│   350 │   │   │   │   │   # load the yaml file                                                   │351 │   │   │   │   │   # NOTE: for the SafeLoader argument, see: https://github.com/yaml/py   │
│ ❱ 352 │   │   │   │   │   content = yaml.load(f, Loader=yaml.SafeLoader)                         │
│   353 │   │   │   │   │   trace("Loading yaml file:", filename)                                  │
│   354 │   │   │   │   if key is not None:                                                        │
│   355 │   │   │   │   │   content = {key: content}                                               │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       el = {'external_links': '../en/data/external_links.yml'}                               │ │
│ │        f = <_io.TextIOWrapper name='D:\\fastapi\\docs\\zh\\../en/data/external_links.yml'    │ │
│ │            mode='r' encoding='cp936'>                                                        │ │
│ │ filename = 'D:\\fastapi\\docs\\zh\\../en/data/external_links.yml'                            │ │
│ │      key = 'external_links'                                                                  │ │
│ │     self = <mkdocs_macros.plugin.MacrosPlugin object at 0x00000122F53C20C0>                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\anaconda3\envs\mkdoc\Lib\site-packages\yaml\__init__.py:79 in load                            │
│                                                                                                  │
│    76Parse the first YAML document in a stream                                              │
│    77and produce the corresponding Python object.                                           │
│    78 │   """                                                                                    │
│ ❱  79 │   loader = Loader(stream)                                                                │
│    80 │   try:                                                                                   │
│    81 │   │   return loader.get_single_data()                                                    │
│    82 │   finally:                                                                               │
│                                                                                                  │
....................................... OTHER ERROR MESSAGE.........................................
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 1165: illegal multibyte sequence

Traceback (most recent call last):
  File "D:\mkdocs-macros-plugin\setup.py", line 36, in <module>
    long_description=read_file('README.md'),
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mkdocs-macros-plugin\setup.py", line 29, in read_file
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 667: illegal multibyte sequence

The root cause is the 'gbk' encoding. However, the first trace shows that this is because mkdocs-macros-plugin open the file without setting the utf-8 encoding.

Copy link

github-actions bot commented May 9, 2024

Welcome to this project and thank you!' first issue

@caomingpei caomingpei changed the title ' [Report] default encoding issue for other character set May 9, 2024
caomingpei added a commit to caomingpei/mkdocs-macros-plugin that referenced this issue May 9, 2024
Fix this fralau#227
This commit is to set the encoding when open a file. It's useful for
other characters set users.
@fralau fralau added the enhancement New feature or request label May 13, 2024
@kchr
Copy link
Contributor

kchr commented Jun 1, 2024

I did a quick test by opening a GBK encoded text file and forcing the encoding to UTF-8 like so:

>>> f = open('gbk.crlf.txt', 'r', encoding='utf-8')
>>> f
<_io.TextIOWrapper name='gbk.crlf.txt' mode='r' encoding='utf-8'>
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 0: invalid start byte

Here is the sample file I used, with characters from the GBK set:

https://raw.githubusercontent.com/x1angli/cvt2utf/master/sample_data/gbk.crlf.txt

There is a proposed fix in PR #228 that enforces the encoding to UTF-8 when opening files, the same way as the example snippet above. However, is that fix really a viable solution considering the error raised when trying to read the sample file encoded in GBK? It seems to be a workaround that works for this particular user, and may cause unpredictable issues for others.

I am using Python 3.12.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants