-
-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unwanted space characters in Japanese language #1420
Comments
This is actually HTML adding the spaces in, sort of. You see, Asciidoctor passes the text as you see it (after removing trailing whitespace on the line) to HTML. All whitespace gets consolidated by HTML into a single space. That looks normal in English text (as the endline in the source is most likely at the boundary of a word or sentence). However, in Japanese text you end up with an unwanted space. One solution to this problem today is to create a Treeprocessor or Postprocessor extension that finds all paragraph text and removes the unwanted space. For the long term, this is an intriguing question as it affects all similar languages. Poetry-style writing (aka sentence or phrase per line) should be an option when writing in these languages but still get the desired output. I think perhaps the solution is to change the behavior when the |
...and we certainly want Asciidoctor to friendly and comfortable for all languages. |
Chinese has the same problem. I write a Treeprocessor require 'asciidoctor/extensions'
class TrailingTreeprocessor < Asciidoctor::Extensions::Treeprocessor
def process document
return unless document.blocks?
process_blocks document
nil
end
def process_blocks node
node.blocks.each_with_index do |block, index|
if block.context == :paragraph
node.blocks[index] = create_paragraph block.document, block.content.gsub("\n", ''), block.attributes
else
process_blocks block
end
end
end
end
Asciidoctor::Extensions.register do
treeprocessor TrailingTreeprocessor
end Save in
|
The switch that needs to be enabled here in core is what is the character for a prose endline. In Latin-based languages, it is a literal endline. For CJK, it would need to be an empty space. |
And this would be something that could be controlled through the language or language family. For now, you need to either take the approach that @chloerei suggested, or don't insert endlines in your prose in the AsciiDoc source document. |
Macros in CJK have the same problem (unwanted spaces).
is currently (ver1.5.4) converted to
, which contains unwanted space before ProposalI propose two (exclusive) rules below. Rule to remove spacesThese are removed in the output:
Example:
would be converted to:
Rule to preserve space(s)These are converted to single space in output:
Example:
would be converted to:
|
Digging a bit in the issues, I found we probably had a similar conversation in #1174. |
@lo48576 Yo, man.
is converted to
And,
is converted to
If you like to make a visual distinction in adoc, you can use
|
What about letting backslash at the end of a line concatenate the next line to the current line? Could
be converted to
without breaking backward compatibility? |
AFAIK, this is the solution in reStructuredText which is the only markup language supports this feature so far. |
Hi. Related information in Markdown as far as I know: https://talk.commonmark.org/t/soft-line-breaks-should-not-introduce-spaces/285 (which came from TryGhost/Ghost#3893 ) Plugin for markdown-it to automatically deal with segment breaks: https://github.com/markdown-it/markdown-it-cjk-breaks whose algorithm matches CSS Text Module Level 3 (which came from https://talk.commonmark.org/t/soft-line-breaks-should-not-introduce-spaces/285/9 ) It is said this plugin is similar to one in pandoc: |
Related issue: #4468. |
The CSS spec has been changed. (It's handled by CSS, not HTML) https://wpt.fyi/results/css/css-text/line-breaking?label=experimental&label=master&aligned&q=segment-break-transformation-rules- |
@mojavelinux I think we can close the issue because it is a browser's issue rather than asciidoctor's issue? |
Does AsciiDoctor just retain newlines without converting them to spaces by itself? |
Yes, Asciidoctor leaves the space characters as they are written (spaces remain as spaces and newlines remain as newlines). The assumption is that the renderer will normalize them, such as the browser for HTML. |
I got it. Do you know where we should discuss the specification shared with the entire of Asciidoctor family? |
You're free to ask open-ended questions in the project chat at https://chat.asciidoctor.org. |
I see. Asciidoctor supports DocBook & EPub ports, too. I don't know how they treat newlines in XML. I have 2 questions about them: are they left to renderes? How many renderers for them use the browser architecture? |
Please continue this discussion in the chat. The issue tracker is intended to track design decisions. It's not for open-ended discussions. |
In terms of the HTML converter, it seems this issue has been resolved by CSS and thus no action is needed here. |
DocBook is XML based. Are you saying EPub & DocBook both use CSS for styling? |
Also Asciidoctor shouldn't trust the CSS implements of web browsers today too much. |
If this is a behavior you need, you're welcome to extend the converter and add the logic to that extended converter. This is not something we're going to add to Asciidoctor right now. |
Good Morning,
I'm inserting line breaks into the AsciiDoc source to make long sentences easier readable.
For the English language this works as expected, a line break in the source translates into a space in the output.
AsciiDoc:
Output:
The Japanese language doesn't know spaces between words though, the line breaks should be ignored.
I wonder if there is any configuration option that can influence this behavior?
AsciiDoc:
Output:
Desired output:
Thanks,
Jan
The text was updated successfully, but these errors were encountered: