Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support Chinese encoding #46

Closed
lvzhenbo opened this issue May 10, 2023 · 12 comments
Closed

Please support Chinese encoding #46

lvzhenbo opened this issue May 10, 2023 · 12 comments
Labels
Type: Bug Something is not working

Comments

@lvzhenbo
Copy link

Chrome ouput
image
static-marks output
image
image

@darekkay darekkay added the Type: Bug Something is not working label May 12, 2023
@ha-na-bi
Copy link

After installing the command npm install -g static-marks in a Chinese environment, I encountered the same error. However, it's amazing that after pulling the project and locally compiling it, the bookmarks containing Chinese characters can be handled normally.

image

@darekkay
Copy link
Owner

Unfortunately, I can't reproduce it, both in the local project or when using static-marks as a CLI tool. I've tested a bookmark with the name 花火 and it's working fine.

The error from the screenshot states that "null byte is not allowed in input". I can also see that there are some invisible characters in the YAML file for each line, including the first Imported: line. Can you please remove those characters and try again?

To be able to further help out I would require a reproducible example.

@lvzhenbo
Copy link
Author

There is a problem with the yml of chrome exported bookmark html to static-marks
Example files:https://github.com/lvzhenbo/bookmarks

@darekkay
Copy link
Owner

Thanks for providing the repository. I did the following:

npx static-marks import bookmarks_2023_6_27.html -o bookmarks.yml
npx static-marks build bookmarks.yml -o bookmarks.html

This worked for me without any problems. The interesting part are the differences between my generated bookmarks.yml (please check my gist) and your bookmarks_2023_6_27. I think there is an issue with the encoding:

  • mine:
Imported:
  - 办公:
      - 我的仪表盘 - TAPD平台: https://www.tapd.cn/my_dashboard/index
      - Projects · Dashboard · GitLab: http://git.kongque510.com/
  • yours:
Imported:
  - 鍔炲叕:
      - 鎴戠殑浠〃鐩?- TAPD骞冲彴: https://www.tapd.cn/my_dashboard/index
      - Projects 路 Dashboard 路 GitLab: http://git.kongque510.com/

If you compare your YML file with the source, you can spot the difference in the characters.

I can think of two places that might be causing the difference:

  1. I am assuming an utf-8 encoding of the source HTML file. Can you please check if the encoding matches on your system?
  2. I am using bookmarks-parser to convert the HTML file into YML. It's possible that the library has encoding problems.

@lvzhenbo
Copy link
Author

lvzhenbo commented Jun 28, 2023

It's the command, I'm using this command static-marks import .\bookmarks_2023_6_27.html > .\bookmarks_2023_6_27.yml
Not -o, but the right pointed bracket in the readme

@darekkay
Copy link
Owner

When I use > instead of -o, everything still works fine for me, both with the Windows command line and with the Windows Git bash. I assume the result might depend on the system language / encoding. Unfortunately, I would rely on some help (debugging or pull request) from someone who can reproduce the problem.

@lvzhenbo
Copy link
Author

Although my system environment is Chinese encoding but as well as support utf-8 or most cases are utf-8, I am not sure what the difference between this pointed bracket and -o, but if the two functions are the same then it is good to use -o directly

@lvzhenbo
Copy link
Author

Also according to @ha-na-bi, the clone repository runs without problems, it's the use of npm packages that is problematic

@darekkay
Copy link
Owner

Just to clarify: does using -o work for you without issues? If yes, then I would only adjust the documentation and mark -o as the "correct" way.

@lvzhenbo
Copy link
Author

Yes, I ran it and confirmed again that -o had no coding problems

@Jiehong
Copy link

Jiehong commented Mar 8, 2024

I think the issue comes from the >! You are using powershell, so this is an alias to powershell's out-file, and depending on the powershell version, it might not always default to utf-8 encoding, therefore producing something incorrect.

This wouldn't happen in bash/linux, since the > works differently, nor should it be an issue on modern versions of powershell (like v 7.x).

But on older windows versions, the default powershell (aka windows powershell), is v5, and works differently, and is not really oriented on "utf8 by default".

So perhaps the recommendation should be "don't use powershell, or use powershell v7.x+" only. (or use that -o setting, so that the shell is bypassed entirely, yep).

darekkay added a commit that referenced this issue May 23, 2024
@darekkay
Copy link
Owner

I've updated the documentation and reference back to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants