Skip to content

Latest commit

 

History

History
77 lines (51 loc) · 4.3 KB

File metadata and controls

77 lines (51 loc) · 4.3 KB

No more inserting space between Chinese or Japanese (e.g. hanzi and kana) and western characters (#11597 by @tats-u)

The current behavior of inserting whitespace (U+0020) between Chinese or Japanese (e.g. hanzi/kanji and kana) and western (e.g. alphanumerics) characters is not based on the official layout guidelines in Japanese and Chinese but non-standard and local one in Chinese.

Official Japanese guideline (W3C):

3.9.1 Differences in Positioning of Characters and Symbols

The positioning of characters and symbols may vary depending on the following.

d. Are characters and symbols appearing in sequence in solid setting, or will there be a fixed size space between them? For example, sequences of ideographic characters (cl-19) and hiragana (cl-15) are set solid, and for Western characters (cl-27) following hiragana (cl-15) there will be quarter em spacing.

https://www.w3.org/TR/jlreq/#differences_in_positioning_of_characters_and_symbols

“one quarter em” means one quarter of the full-width size. (JIS Z 8125)
“one quarter em space” means amount of space that is one quarter size of em space.

https://www.w3.org/TR/jlreq/#term.quarter-em
https://www.w3.org/TR/jlreq/#term.quarter-em-space

Official Japanese guideline (JIS X 4051:2004):

4.7 和欧文混植処理

a) 横書きでは,和文と欧文との間の空き量は,四分アキを原則とする。

4.7 Mixed Japanese and Western Text Composition

a) In horizontal writing, the space between Japanese and western text should be one quarter em, as a rule.

PR Author's Note: Original text is written only in Japanese and translation is based on DeepL.

https://kikakurui.com/x4/X4051-2004-02.html (Japanese)

Official Chinese guideline (W3C):

3.2.2 Mixed Text Composition in Horizontal Writing Mode

In principle, there is tracking or spacing between an adjacent Han character and a Western character of up to one quarter of a Han character width, except at the line start or end.

NOTE: Another approach is to use a Western word space (U+0020 SPACE), in which case the width depends on the font in use.

https://www.w3.org/TR/clreq/#mixed_text_composition_in_horizontal_writing_mode

As mentioned above, whitespace (U+0020) is allowed to be substituted for one quarter em only in Chinese, although they have a similar appearance. Also, even in Chinese, the rule is not adopted even in the W3C guideline page but is mentioned as just one of the options.

Some renderers (e.g. convert to PDF using Pandoc with the backend of LaTeX) can automatically insert genuine one quarter em. The width of whitespace is different from one quarter em, so inserting whitespace (U+0020) takes away the option to leave it to renderers to insert one quarter em. Adding space should be left to renderers and should not be done by Prettier, just a formatter.

Adding whitespace may interfere with searches for text containing both Chinese or Japanese and western characters. For example, you cannot find “第1章” (Chapter 1) in a Markdown document or its derivative just by searching by the string “第1章” but “第 1 章”.

To make matters worst, once whitespace is inserted, it is difficult to remove it. The following sentence cannot be said to be wrong.

作る means make in Japanese.

The too simple rule of removing whitespace between Chinese or Japanese characters and alphanumerics removes that between “作る” and “means” unless you modify the sentence, that is, quote “作る”. It is so difficult to create a common rule that can safely remove whitespace from all documents and deserves to be included in Prettier.

In conclusion, the imposition of the non-standard rule by just a formatter must be ended.

<!-- Input -->
漢字Alphabetsひらがな12345カタカナ67890한글

漢字 Alphabets ひらがな 12345 カタカナ 67890 한글

<!-- Prettier stable -->
漢字 Alphabets ひらがな 12345 カタカナ 67890한글

漢字 Alphabets ひらがな 12345 カタカナ 67890 한글

<!-- Prettier main -->
漢字Alphabetsひらがな12345カタカナ67890한글

漢字 Alphabets ひらがな 12345 カタカナ 67890 한글