Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output HTML contains NULL chracters in at least CJK languages #9985

Open
2 of 7 tasks
tats-u opened this issue Mar 26, 2024 · 11 comments
Open
2 of 7 tasks

Output HTML contains NULL chracters in at least CJK languages #9985

tats-u opened this issue Mar 26, 2024 · 11 comments
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax status: needs more information There is not enough information to take action on the issue.

Comments

@tats-u
Copy link
Contributor

tats-u commented Mar 26, 2024

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Docusarus sometimes contaminate output HTMLs with NULL chracters.
NULL characters confuses some HTML parsers used in some document scraper like https://github.com/meilisearch/docs-scraper. (it uses lxml written in Python)
Also it prevents Windows' copy-and-paste feature from copying the complete source code.

Reproducible demo

No response

Steps to reproduce

curl -LsSf https://docusaurus.io/zh-CN/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus-i18n-staging.netlify.app/ja/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use ut
f8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus.io/ko/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[
\[NULL)/...\1/'

Note

  • rg is ripgrep.
  • Perl is used for trimming of the results.

For your own documents

Write your documents in CJK or possibly other non-latin languages and then do:

npm run build
 rg '\x00' -a -r '[[NULL]]' --color=always -t html build | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'

Note

Built JS files do not seem to be affected. (no NULs are found there)

Expected behavior

No outputs (NULL characters are not found)

Actual behavior

🇨🇳

...res"><span itemprop="name">Markdown 特[[NULL]][[NULL]]性</span></a><meta itemprop="position" content="2"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" itemprop="name">标题和目录</span><meta itemprop="position" content="3"></li></ul></nav><span class="theme-doc-version-badge badge badge--secondary">版本:3.1.1</span><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">本页总览</button></div><div class="theme-doc-markdown markdown"><h1>标题和目录</h1>
...ia-label="链接到 示例小节 1 a III" title="链接[[NULL]][[NULL]]到 示例小节 1 a III">​</a></h4>

🇯🇵

...ocusaurus</b></a><nav aria-label="ドキュ[[NULL]]メントのサイドバー" class="menu thin-scrollbar menu_rWGR menuWithAnnouncementBar_Pf08"><ul class="theme-doc-sidebar-menu menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link" href="/ja/docs">はじめに</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/category/getting-started">入門編</a><button aria-label="Collapse sidebar category &#x27;入門編&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/installation">インストール</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/configuration">設定</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/playground">プレイグラウンド</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/typescript-support">TypeScript サポート</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" href="/ja/docs/category/guides">ガイド</a><button aria-label="Collapse sidebar category &#x27;ガイド&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/creating-pages">Pages</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/docs-introduction">ドキュメント</a><button aria-label="Expand sidebar category &#x27;ドキュメント&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/blog">ブログ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" tabindex="0" href="/ja/docs/markdown-features">マークダウンの機能</a><button aria-label="Collapse sidebar category &#x27;マークダウンの機能&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/react">MDX and React</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/tabs">Tabs</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/code-blocks"> コードブロック</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/admonitions">注意書き</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link menu__link--active" aria-current="page" tabindex="0" href="/ja/docs/markdown-features/toc">見出しと目次</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/assets">Assets</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/links">Markdown links</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/plugins">MDX Plugins</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/math-equations">数式</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/diagrams">図</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/head-metadata">Head metadata</a></li></ul></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/styling-layout">Styling and Layout</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/swizzling">スウィズリング(Swizzling)</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/static-assets">静的アセット</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/search">検索</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/browser-support">ブラウザ対応</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/seo">SEO</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/using-plugins">プラグインの利用</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/deployment">デプロイ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/i18n/introduction">国際化 (i18n)</a><button aria-label="Expand sidebar category &#x27;国際化 (i18n)&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/guides/whats-next">What&#x27;s next?</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/advanced">上級者向けガイド</a><button aria-label="Expand sidebar category &#x27;上級者向けガイド&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/migration">Upgrading</a><button aria-label="Expand sidebar category &#x27;Upgrading&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li></ul></nav><button type="button" title="サ イドバーを隠す" aria-label="サイドバーを隠す" class="button button--secondary button--outline collapseSidebarButton_PUyN"><svg width="20" height="20" aria-hidden="true" class="collapseSidebarButtonIcon_DI0B"><g fill="#7a7a7a"><path d="M9.992 10.023c0 .2-.062.399-.172.547l-4.996 7.492a.982.982 0 01-.828.454H1c-.55 0-1-.453-1-1 0-.2.059-.403.168-.551l4.629-6.942L.168 3.078A.939.939 0 010 2.528c0-.548.45-.997 1-.997h2.996c.352 0 .649.18.828.45L9.82 9.472c.11.148.172.347.172.55zm0 0"></path><path d="M19.98 10.023c0 .2-.058.399-.168.547l-4.996 7.492a.987.987 0 01-.828.454h-3c-.547 0-.996-.453-.996-1 0-.2.059-.403.168-.551l4.625-6.942-4.625-6.945a.939.939 0 01-.168-.55 1 1 0 01.996-.997h3c.348 0 .649.18.828.45l4.996 7.492c.11.148.168.347.168.55zm0 0"></path></g></svg></button></div></div></aside><main class="docMainContainer_EfwR"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_n6xZ"><div class="docItemContainer_RhpI"><article><nav class="theme-doc-breadcrumbs breadcrumbsContainer_Wvrh" aria-label="パンくずリスト"><ul class="breadcrumbs" itemscope="" itemtype="https://schema.org/BreadcrumbList"><li class="breadcrumbs__item"><a aria-label="ホーム画面" class="breadcrumbs__link" href="/ja/"><svg viewBox="0 0 24 24" class="breadcrumbHomeIcon_uaSn"><path d="M10 19v-5h4v5c0 .55.45 1 1 1h3c.55 0 1-.45 1-1v-7h1.7c.46 0 .68-.57.33-.87L12.67 3.6c-.38-.34-.96-.34-1.34 0l-8.36 7.53c-.34.3-.13.87.33.87H5v7c0 .55.45 1 1 1h3c.55 0 1-.45 1-1z" fill="currentColor"></path></svg></a></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link" itemprop="item" href="/ja/docs/category/guides"><span itemprop="name">ガイド</span></a><meta itemprop="position" content="1"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link" itemprop="item" href="/ja/docs/markdown-features"><span itemprop="name">マークダウンの機能</span></a><meta itemprop="position" content="2"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" itemprop="name">見出しと目次</span><meta itemprop="position" content="3"></li></ul></nav><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">このページ</button></div><div class="theme-doc-markdown markdown"><h1>見出しと目次</h1>
...itle="Example subsubsection 3 b I への直[[NULL]]リンク">​</a></h4>

🇰🇷

...를 사용하는 경우에는 각 ID가 각 페이지에서 정확하게 한 번만 표[[NULL]]시되는지 확인하세요. 그렇지 않으면 같은 ID를 가진 두 개의 DOM 요소가 존재하게 됩니다. 이는 잘못된 HTML이며 제목과 적절하게 연결할 수 없게 됩니다.</p></div></div>
...iv class="admonitionContent_Knsx"><p>[[NULL]][[NULL]]아래는 현재 페이지에서 더 많은 목차 항목을 사용할 수 있는 더미 콘텐츠입니다.</p></div></div>

Note

  • Other pages are likely to be affected.
  • The same pages in latin languages are not affected.

Your environment

First found private document site written in Japanese:

  • Public source code: N/A
  • Public site URL: N/A
  • Docusaurus version used: 3.1.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Node 20 (latest LTS)
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Ubuntu (GitHub Actions)

The above commands are run in Ubuntu 22.04 on WSL on Windows 11.

Self-service

  • I'd be willing to fix this bug myself.
@tats-u tats-u added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Mar 26, 2024
@Josh-Cena
Copy link
Collaborator

Have you checked if it's an MDX issue? Hard to believe Docusaurus has anything to do here. I can also test later.

@Josh-Cena Josh-Cena added status: needs more information There is not enough information to take action on the issue. domain: markdown Related to Markdown parsing or syntax and removed status: needs triage This issue has not been triaged by maintainers labels Mar 26, 2024
@tats-u
Copy link
Contributor Author

tats-u commented Mar 26, 2024

I will check other CJK sites built with other software (e.g. Astro & Nextra).

@Josh-Cena
Copy link
Collaborator

When I'm debugging this, I usually isolate an MDX compiler with the same setup as Docusaurus, and invoke it programmatically.

@tats-u
Copy link
Contributor Author

tats-u commented Mar 27, 2024

None of Astro & Nextra sites seem to be affected.

Rspress, which also uses MDX (maybe uses mdxjs-rs or markdown-rs instead), is not affected.

However, The document of Ant Design is affected. (They do not use Docusaurus or MDX but only remark.

Also, the demo of @easyops-cn/docusaurus-search-local is affected only when the UI language is Chinese despite the fact that the page content is the same one written in Chinese. This is strange and interesting.

@slorber
Copy link
Collaborator

slorber commented Mar 28, 2024

Hey

To be honest I'm not super familiar with any of those concepts and won't have the bandwidth to investigate much 😅

I was just wondering, couldn't this be a Crowdin translation issue?

I'm not super skilled in rg and perl, can you tell me if you see anything weird in these input MD files?

zh-CN.zip

@tats-u
Copy link
Contributor Author

tats-u commented Mar 28, 2024

can you tell me if you see anything weird in these input MD files?

No NULL characters are found in html, md, mdx, json, or css files in your ZIP archive.

I was just wondering, couldn't this be a Crowdin translation issue?

I found this issue in my (our) site where i18n is not applied, so I am convinced that Crowdin is not concerned with it.

@slorber
Copy link
Collaborator

slorber commented Mar 28, 2024

Thanks for investigating.

Also worth giving a try to use this env variable on your site when building: process.env.SKIP_HTML_MINIFICATION === 'true'

@tats-u
Copy link
Contributor Author

tats-u commented Mar 29, 2024

Neither of $env:SKIP_HTML_MINIFICATION = "true" (I am using PowerShell) nor --no-minify helped.
MDX part was still minified.
Also, changing the locale from "ja" to "en" did not, either.

@tats-u
Copy link
Contributor Author

tats-u commented Mar 29, 2024

https://typescriptbook.jp/ (https://github.com/yytypescript/book)

This site uses Docusaurus 2.4.1, and NULL chars are not found there.

@Josh-Cena
Copy link
Collaborator

I will check this afternoon. There's a chance that there's something environment specific.

@tats-u
Copy link
Contributor Author

tats-u commented May 6, 2024

I found both Docusaurus and Ant Design website have div whose class has markdown.
However, none of Nextra, Rspress, or Astro have.

And looks like https://ant.design/docs/blog/line-ellipsis-cn doesn't contain NULL now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax status: needs more information There is not enough information to take action on the issue.
Projects
None yet
Development

No branches or pull requests

4 participants
@slorber @tats-u @Josh-Cena and others