Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference Style Links are Converted (and lost) #34

Open
tajmone opened this issue Dec 19, 2016 · 8 comments
Open

Reference Style Links are Converted (and lost) #34

tajmone opened this issue Dec 19, 2016 · 8 comments

Comments

@tajmone
Copy link

tajmone commented Dec 19, 2016

I wanted to carry on from issue #26 (now closed).

markdownfmt converts reference-styled links into normal inline links. Eg:

[link text][link id]  
[link id]: http://www.somelink.com      "my ref-link title

becomes:

[link text](http://www.somelink.com "my ref-link title")

I think this is quite an issue. A tidy-up tool shouldn't convert these links and remove the reference labels — tidy-markdown doesn't.

I would like to motivate this issue through real-use scenarios examples.

It is common practice when working with markdown documentation to store all links at the end of a document, in reference-link sections. This means that when a link changes, one only has to go at the bottom of all documents and change the links section and all links within the document will be «updated» (so to speak).

To give you an example: using RawGit to provide html live previews — the proper way to use RawGit for production links is to use the cdn.rawgit URL, which caches permanently the preview. So, usually links point to specific commits of the file, to prevent permanent caching of the master branch version. As files change in time, so do their RawGit preview URLs:

https://cdn.rawgit.com/username/repo/baf2857/somepath/index.html

after a new commit (hash: xyz1938) might become:

https://cdn.rawgit.com/username/repo/xyz1938/somepath/index.html

And so on with all links that use RawGit — the commit part of the URLs will have to change to accomodate the new commit hash.

And instead of going mad to search-and-replace all URLs inside the actual links, the quick way is to change it just in the links-section at the end of each document in the repo — something often automated with scripts in large documentation projects.

Basically, markdown makes life a lot easier with reference-styled links — I personally store some snippets files with all the links used in a project, carry out all changes to links in the snippet files, and then just copy-and-paste the whole links-section over the old one in all docs (yep: ALL DOCS, because unused link-ids are harmless, they just don't show up in the final preview of the document!).

In bigger projects, I use a pandoc toolchain, and invoke pandoc to always add/include the reference-links section file when processing sources — so the actual link-section is stored in a single file, used by all documents for all external links. This is a cool way to handle massive quantities of links. And it also allows to have different link-sections for different versions of a project (eg for a repo which is both on GitHub and Bitbucket) without having duplicate documents.

But Why?

What I haven't yet figure out is why blackfriday converts these links. Usually this happens in parsers which built an AST from the source (like pandoc), and in the post-processing phase they replace all references with their actual values. But I haven't yet worked my way into the source code enough to understand how it all works under the hood.

If this was the case, it might not be so easy to change this behavior. Any ideas about it?

But definitely, a clean-up tool shouldn't remove elements from the doc.

@dmitshur
Copy link
Member

Hi @tajmone, thank you for writing up this detailed issue. I will reply to various parts of your comment.

I think this is quite an issue. A tidy-up tool shouldn't convert these links and remove the reference labels — tidy-markdown doesn't.

But definitely, a clean-up tool shouldn't remove elements from the doc.

First thing to keep in mind about markdownfmt is that it's not simple a clean-up tool. Its description reads:

Like gofmt, but for Markdown.

I should elaborate, but what I mean by that is... Like gofmt, it's meant to be opinionated, and it tries to unify multiple equivalent ways of doing something into one canonical format. For example, it always uses - item for bullet points (instead of * item), --- for headers (instead of ***** or many other valid combinations), and **bold for making text bold (instead of __bold__).

When I first created markdownfmt, I really wanted it to be something everyone can run on most Markdown files, just like people run gofmt on all .go files. While not everyone likes the exact formatting decisions gofmt makes, people do like the overall consistency of having one unified tool.

Later on, I realized that Markdown is too vague of a "specification" and has too many variations and inconsistencies for this to be possible, so I know it's not an attainable goal. As I said in shurcooL-legacy/atom-markdown-format#26 (comment), If I really wanted to achieve that today, I'd begin by creating a standalone spec first.

However, this is just background information. Let me get into this specific issue.

I would like to motivate this issue through real-use scenarios examples.

Thanks for doing that, it's very helpful.

It is common practice when working with markdown documentation to store all links at the end of a document, in reference-link sections. This means that when a link changes, one only has to go at the bottom of all documents and change the links section and all links within the document will be «updated» (so to speak).

To give you an example: using RawGit to provide html live previews — the proper way to use RawGit for production links is to use the cdn.rawgit URL, which caches permanently the preview. So, usually links point to specific commits of the file, to prevent permanent caching of the master branch version. As files change in time, so do their RawGit preview URLs:

https://cdn.rawgit.com/username/repo/baf2857/somepath/index.html

after a new commit (hash: xyz1938) might become:

https://cdn.rawgit.com/username/repo/xyz1938/somepath/index.html

And so on with all links that use RawGit — the commit part of the URLs will have to change to accomodate the new commit hash.

And instead of going mad to search-and-replace all URLs inside the actual links, the quick way is to change it just in the links-section at the end of each document in the repo — something often automated with scripts in large documentation projects.

Basically, markdown makes life a lot easier with reference-styled links — I personally store some snippets files with all the links used in a project, carry out all changes to links in the snippet files, and then just copy-and-paste the whole links-section over the old one in all docs (yep: ALL DOCS, because unused link-ids are harmless, they just don't show up in the final preview of the document!).

In bigger projects, I use a pandoc toolchain, and invoke pandoc to always add/include the reference-links section file when processing sources — so the actual link-section is stored in a single file, used by all documents for all external links. This is a cool way to handle massive quantities of links. And it also allows to have different link-sections for different versions of a project (eg for a repo which is both on GitHub and Bitbucket) without having duplicate documents.

Thank you for providing that rationale. It's convincing, and I understand why it would be preferable to maintain the two type of links separately instead of merging them into the same inline kind. I agree. 👍

Now let's discuss the potential implementation side of this.

What I haven't yet figure out is why blackfriday converts these links. Usually this happens in parsers which built an AST from the source (like pandoc), and in the post-processing phase they replace all references with their actual values. But I haven't yet worked my way into the source code enough to understand how it all works under the hood.

If this was the case, it might not be so easy to change this behavior. Any ideas about it?

The current blackfriday implementation is very old, and has certain limitations that are unavoidable. One of them is that it does not preserve the original markdown document layout, and it's a very lossy parsing. It only emits enough information to produce HTML for the rendered Markdown.

So, it's actually extremely hard to implement this with that blackfriday implementation, even if I really wanted to. You can also see #6 (comment) for a detailed explanation.

However, there is a version 2 of blackfriday which resolves this limitation. If markdownfmt were to be rewritten to use blackfriday v2, then this becomes a possibility.

@tajmone
Copy link
Author

tajmone commented Dec 24, 2016

@shurcooL , thanks for the great reply.

I personally don't see any issue wheather markdownfmt uses *, - or + for bullet lists -- as long as it enforces some kind of accepted markdown standard it serves to avoid diffing nightmares and false commits. Aesthetics aside, markdown is intended as an intermediate format, so what counts is what you get when you convert it to some other format.

I wasn't aware of this blackfriday limitations, nor that a version 2 is available. Thanks for letting me know, and for having accepted this proposal as an enhancement.

Wish I could help, but right now my knowledge in Go is too weak ... and I already juggling with a few langs I'm learning.

B.R.

Tristano

@hartzell
Copy link

@shurcooL, first thanks for sharing this tool. I'd also like to see this behavior change. Is there an active effort to move to v2 of blackfriday?

@hartzell
Copy link

@shurcooL , @jsternberg

I'm not a fan of markdownfmt inlining reference links. I use them when I need use a link in multiple places in the document and I want to avoid repeating myself and creating an opportunity to update some links but not others.

@JStenberg has a fork of markdownfmt that uses blackdown.v2, which the earlier comments in this thread suggested has sufficient hooks to enable rendering reference links.

I've been reading through the blackfriday source and running some dumb tests w/ spew.Dump() and I'm afraid that blackfriday is still discarding what we'd need to preserve reference links.

Here is the bit where they're parsed. Reference links ([e.g.][this] seem to be collapsed into the same LinkData struct as the other link types.

Heavy handed use of spew.Dump() in the the blackfriday HTML renderer supports this conclusion (e.g. around line 547 of html.go):

	case Link:
		// mark it but don't link it if it is not a safe link: no smartypants
		spew.Dump(node.LinkData)
		dest := node.LinkData.Destination
		if needSkipLink(r.Flags, dest) {

I've tentatively concluded that to get the behavior I want I'd need to convince blackfriday to use two different types of Nodes and the modify the renders. Does that match your understanding?

@dmitshur
Copy link
Member

dmitshur commented Mar 2, 2018

Is there an active effort to move to v2 of blackfriday?

See issue #39 and PR #40 for the latest updates on that.

@JStenberg has a fork of markdownfmt that uses blackdown.v2, which the earlier comments in this thread suggested has sufficient hooks to enable rendering reference links.

I've been reading through the blackfriday source and running some dumb tests w/ spew.Dump() and I'm afraid that blackfriday is still discarding what we'd need to preserve reference links. [...] Does that match your understanding?

It was my understanding that blackfriday v2 would preserve the information about whether the link was inline or reference-style, as preserving more information about the parsed markdown document was a part of the design goal of v2. I hadn't actually confirmed whether that's the case for this particular case.

/cc @rtfb Do you know if v2 lets one be able to tell whether a link was inline or reference-style?

@jsternberg
Copy link

I don't know about the inline vs reference-style, but v2 of blackfriday isn't able to tell you whether a link is an auto link or not. So you can't tell the difference between these: [http://example.com](http://example.com) and http://example.com. I just opted to make it always use an auto link because I figured markdownfmt was going to make decisions on how to format a document anyway.

@hartzell
Copy link

hartzell commented Mar 3, 2018

I'm 99.9% sure that blackfriday.v2 is discarding the information about the type of link used and collapsing it into the same node for either case.

Here is the code that's doing the work.

The fastest hack to peek at what was happening was to add a call to spew.Dump() in the the blackfriday HTML renderer like so (e.g. around line 547 of html.go):

	case Link:
		// mark it but don't link it if it is not a safe link: no smartypants
		spew.Dump(node.LinkData)
		dest := node.LinkData.Destination
		if needSkipLink(r.Flags, dest) {

and then run a little test program.

package main

import (
	"fmt"

	"gopkg.in/russross/blackfriday.v2"
)

func main() {
	input := []byte(`
- This is [reference][moose]
- This is [inline](http://www.georgehartzell.com)

[moose]: http://www.georgehartzell.com
`)
	output := blackfriday.Run(input)
	fmt.Println(string(output))
}

It generates the following output (dump is called twice, explanation here):

(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) (cap=47) {
 },
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) (cap=47) {
 },
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) <nil>,
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) <nil>,
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
<ul>
<li>This is <a href="http://www.georgehartzell.com">reference</a></li>
<li>This is <a href="http://www.georgehartzell.com">inline</a></li>
</ul>

@rtfb
Copy link

rtfb commented Mar 4, 2018

Thanks for digging into this, @hartzell! You're right, currently Blackfriday does not record this information. Here's a tracking issue for that: russross/blackfriday#369 (not exactly the same issue, but close enough to clump together).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants