Not treating bare URL enclosed in angle brackets as unconstrained markup #4468

someth2say · 2023-06-13T17:34:29Z

As discussed here, bare URLs that are enclosed in angle brackets are intended to use the brackets as unconstrained pairs.
But it seems not to be the case.
If the closing bracket is followed by a word separator the link is delimited correctly:

❯ echo "Hello <https://asciidoctor.org/>." | asciidoctor -b html5 -s -
<div class="paragraph">
<p>Hello <a href="https://asciidoctor.org/" class="bare">https://asciidoctor.org/</a>.</p>
</div>

But when the closing bracket is following by a non-word-separator character, then the delimitation proces fails:

❯ echo "Hello <https://asciidoctor.org/>/news/" | asciidoctor -b html5 -s -
<div class="paragraph">
<p>Hello &lt;<a href="https://asciidoctor.org/&gt;/news/" class="bare">https://asciidoctor.org/&gt;/news/</a></p>
</div>

In the example, you can see how:

The < character (transformed to the < entity) not discarded, and it is placed before the linl.
The > character (transformed to the > entity) and the text after in, until the next word separator, bedome part of the link href

This is specially problematic in some asian languages, such as Japanese, were there is no word separators (and sentence separators are not considered word separators).
As there are no word separators, using autolinks directly is not suitable:

echo "URLはhttp://www.google.com。" | asciidoctor -b html5 -s -
<div class="paragraph">
<p>URLはhttp://www.google.com。</p>
</div>

The URL is not even detected as a link.
Even if it was detected, there would be now way to delimit where the link ends and where the text starts.

The most sensible approach is using angle brackets to delimit the link.
But then we encounter the issue above:

❯ echo "URLは<http://www.google.com>。" | asciidoctor -b html5 -s -
<div class="paragraph">
<p>URLは&lt;<a href="http://www.google.com&gt;。" class="bare">http://www.google.com&gt;。</a></p>
</div>

Note the 。 character (sentence terminator) being included in the href.

FTR:

❯ asciidoctor --version
Asciidoctor 2.0.20 [https://asciidoctor.org]
Runtime Environment (ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x86_64-linux]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8)

The text was updated successfully, but these errors were encountered:

mojavelinux · 2023-06-15T07:34:53Z

Here's another example:

URLは<http://www.google.com>。

mojavelinux · 2024-02-19T23:01:09Z

The issue is the trailing 。. It's causing the processor to not see the closing > around the URL.

…s unconstrained syntax

mojavelinux · 2024-02-19T23:39:45Z

This fix turned out to be pretty straightforward. If we find a URL that starts with <, the processor will end the URL at the next >, even if there are adjacent characters. In other words, it will treat this as unconstrained markup.

There's a chance that it over-matches the first occurrence if there's more than one in a line without any spaces, but there's really nothing the current parser can do about that case. You'll need to insert something like {zwsp} to tell the parser to stop looking for the URL. This is something we can address in the AsciiDoc Language.

mojavelinux · 2024-02-20T02:29:58Z

I found a way to support multiple in one line without workarounds. And it's better this way as it will be matched more precisely.

…s unconstrained syntax

mojavelinux · 2024-02-20T11:04:11Z

I think I finally found a matcher that solves this problem while also providing the best compatibility with AsciiDoc.py and has negligible impact on performance, if any at all. This is definitely an area where the syntax is very scantly defined, so we'll be revisiting it to sure it up in the AsciiDoc Language.

…nconstrained syntax

mojavelinux changed the title ~~Fail to delimit bare URL enclosed in angle brackets when not followed by a word separator.~~ Not treating bare URL enclosed in angle brackets as unconstrained markup Jun 15, 2023

mojavelinux added the bug label Jun 15, 2023

mojavelinux self-assigned this Jun 15, 2023

mojavelinux added this to the v2.0.x milestone Jun 15, 2023

mojavelinux mentioned this issue Oct 2, 2023

Unwanted space characters in Japanese language #1420

Open

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 19, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

d373b48

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 19, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

b22a118

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 20, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

c690c8c

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 20, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

2d80890

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 20, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

9412c6e

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 20, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

9aae492

…s unconstrained syntax

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Feb 20, 2024

resolves asciidoctor#4468 treat bare URL enclosed in angle brackets a…

9b48a70

…s unconstrained syntax

mojavelinux closed this as completed in 1905d84 Feb 20, 2024

mojavelinux added compliance v2.0.21 Issues resolved in the 2.0.21 release labels Feb 20, 2024

mojavelinux added a commit that referenced this issue Feb 20, 2024

backport fix for #4468 treat bare URL enclosed in angle brackets as u…

b788020

…nconstrained syntax

mojavelinux mentioned this issue Mar 26, 2024

Parsing an inline link causes processor to crash for certain matches #4570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not treating bare URL enclosed in angle brackets as unconstrained markup #4468

Not treating bare URL enclosed in angle brackets as unconstrained markup #4468

someth2say commented Jun 13, 2023

mojavelinux commented Jun 15, 2023

mojavelinux commented Feb 19, 2024

mojavelinux commented Feb 19, 2024

mojavelinux commented Feb 20, 2024

mojavelinux commented Feb 20, 2024

Not treating bare URL enclosed in angle brackets as unconstrained markup #4468

Not treating bare URL enclosed in angle brackets as unconstrained markup #4468

Comments

someth2say commented Jun 13, 2023

mojavelinux commented Jun 15, 2023

mojavelinux commented Feb 19, 2024

mojavelinux commented Feb 19, 2024

mojavelinux commented Feb 20, 2024

mojavelinux commented Feb 20, 2024