New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not treating bare URL enclosed in angle brackets as unconstrained markup #4468
Comments
Here's another example:
|
The issue is the trailing |
…s unconstrained syntax
…s unconstrained syntax
This fix turned out to be pretty straightforward. If we find a URL that starts with There's a chance that it over-matches the first occurrence if there's more than one in a line without any spaces, but there's really nothing the current parser can do about that case. You'll need to insert something like |
I found a way to support multiple in one line without workarounds. And it's better this way as it will be matched more precisely. |
…s unconstrained syntax
…s unconstrained syntax
…s unconstrained syntax
…s unconstrained syntax
…s unconstrained syntax
I think I finally found a matcher that solves this problem while also providing the best compatibility with AsciiDoc.py and has negligible impact on performance, if any at all. This is definitely an area where the syntax is very scantly defined, so we'll be revisiting it to sure it up in the AsciiDoc Language. |
As discussed here, bare URLs that are enclosed in angle brackets are intended to use the brackets as unconstrained pairs.
But it seems not to be the case.
If the closing bracket is followed by a word separator the link is delimited correctly:
But when the closing bracket is following by a non-word-separator character, then the delimitation proces fails:
In the example, you can see how:
<
character (transformed to the<
entity) not discarded, and it is placed before the linl.>
character (transformed to the>
entity) and the text after in, until the next word separator, bedome part of the linkhref
This is specially problematic in some asian languages, such as Japanese, were there is no word separators (and sentence separators are not considered word separators).
As there are no word separators, using autolinks directly is not suitable:
The URL is not even detected as a link.
Even if it was detected, there would be now way to delimit where the link ends and where the text starts.
The most sensible approach is using angle brackets to delimit the link.
But then we encounter the issue above:
Note the
。
character (sentence terminator) being included in thehref
.FTR:
The text was updated successfully, but these errors were encountered: