Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(derive): add markdown parsing of doc comments in derive #4444

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 22 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions clap_derive/Cargo.toml
Expand Up @@ -42,6 +42,7 @@ quote = "1.0.9"
proc-macro2 = "1.0.42"
heck = "0.4.0"
proc-macro-error = "1"
pulldown-cmark = "0.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.

I wonder how much of a problem that will actually be, especially compared to what we do now. The build times are significantly different than pulldown

https://github.com/rosetta-rs/md-rosetta-rs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I get that. Unfortunately, the way that it accomplishes that is by not really being a markdown parser. It's a parser for something that sorta looks like a subset of markdown.

Honestly I think that not supporting markdown is a better option than supporting minimad's version of "markdown".


There's some things that look like they're just bugs that we could fix that I can only assume would be accepted, like:

Inlines aren't parsed in lists:

* some bullet item
* some _bullet_ item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some _bullet_ item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

There's some things that I'm not sure if they would consider them to be a bug, but we could technically workaround:

Each line of a paragraph is considered a paragraph.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at
tempor commodo ullamcorper a lacus vestibulum sed arcu.  
Normal(Composite { style: Paragraph, compounds: ["Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"] })
Normal(Composite { style: Paragraph, compounds: ["eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at"] })
Normal(Composite { style: Paragraph, compounds: ["tempor commodo ullamcorper a lacus vestibulum sed arcu."] })

Sub-lists (bonus points for parsing a single unconnected * as the start of an italic block)

* some bullet item
  * some bullet item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: Paragraph, compounds: ["  ", I" some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

Only one block level is supported:

> # foobar
Normal(Composite { style: Quote, compounds: ["# foobar"] })

There's way more, these were just the most obviously wrong of the few things I tried. Honestly I think it'd be easier and faster to write a new markdown parser than to pre-parse and re-parse around this crate to get reasonable output of any sort of basic markdown document. That would put a major dent in its lead in your stats too.


tl;dr: We should heed the warning on that crate that says:

If you're looking for a Markdown parser, this one is probably not the one you want

What you want to use it for isn't what it's meant to be used for. It's not a general purpose markdown parser. IMO, it's not even a "general purpose (when your purpose is outputting to the terminal) markdown parser". It's good for when you want some minimal style in terminal output of static, non-user-supplied, text.


[features]
default = []
Expand Down
13 changes: 10 additions & 3 deletions clap_derive/src/item.rs
Expand Up @@ -25,7 +25,7 @@ use syn::{
};

use crate::attr::*;
use crate::utils::{inner_type, is_simple_ty, process_doc_comment, Sp, Ty};
use crate::utils::{inner_type, is_simple_ty, process_doc_comment, process_md_doc_comment, Sp, Ty};

/// Default casing style for generated arguments.
pub const DEFAULT_CASING: CasingStyle = CasingStyle::Kebab;
Expand Down Expand Up @@ -896,7 +896,14 @@ impl Item {
})
.collect();

let (short, long) = process_doc_comment(comment_parts, name, !self.verbatim_doc_comment);
let (short, long) = if self.verbatim_doc_comment {
process_doc_comment(comment_parts, name, !self.verbatim_doc_comment)
} else if !self.verbatim_doc_comment {
// TODO: ^ change to markdown marker or make default?
process_md_doc_comment(comment_parts, name)
} else {
process_md_doc_comment(comment_parts, name)
};
self.doc_comment.extend(short);
if supports_long_help {
self.doc_comment.extend(long);
Expand Down Expand Up @@ -1191,7 +1198,7 @@ impl Kind {
}
}

#[derive(Clone)]
#[derive(Clone, Debug)]
pub struct Method {
name: Ident,
args: TokenStream,
Expand Down
242 changes: 242 additions & 0 deletions clap_derive/src/utils/doc_comments.rs
Expand Up @@ -5,9 +5,251 @@

use crate::item::Method;

use proc_macro2::TokenStream;
use quote::{format_ident, quote};
use std::iter;

pub fn process_md_doc_comment(lines: Vec<String>, name: &str) -> (Option<Method>, Option<Method>) {
use pulldown_cmark::{Event, Parser, Tag};

let text = lines
.iter()
.skip_while(|s| is_blank(s))
.flat_map(|s| s.split('\n'))
.map(|l| if l.starts_with(' ') { &l[1..] } else { l })
.collect::<Vec<&str>>()
.join("\n");

let mut short = TokenStream::new();
let mut long = TokenStream::new();
let mut man = TokenStream::new();

let parser = Parser::new(&text);

// ordered list of parent blocks where we're currently parsing
let mut blocking = Vec::new();
// ordered list of inline features currently active where we're parsing
let mut inliners = Vec::new();

#[derive(PartialEq)]
enum State {
Short,
Long,
Man,
}

let mut state = State::Short;

for def in parser {
let chunk = match def {
Event::Start(tag) => {
match &tag {
Tag::Paragraph => blocking.push(tag),
Tag::Heading(_level, _fragment, _classes) => todo!("heading"), //blocking.push(tag),
Tag::BlockQuote => todo!("blockquote"), //blocking.push(tag),
Tag::CodeBlock(_kind) => todo!("codeblock"), //blocking.push(tag),
Tag::List(_start) => todo!("list"), //blocking.push(tag),
Tag::Item => todo!("item"), //blocking.push(tag),
Tag::FootnoteDefinition(_label) => todo!("footnote"), //blocking.push(tag),
Tag::Table(_alignment) => todo!("table"), //blocking.push(tag),
Tag::TableHead => todo!("tablehead"), //blocking.push(tag),
Tag::TableRow => todo!("tablerow"), //blocking.push(tag),
Tag::TableCell => todo!("tablecell"), //blocking.push(tag),
Tag::Emphasis => inliners.push(tag),
Tag::Strong => inliners.push(tag),
Tag::Strikethrough => todo!("strike"), //inliners.push(tag),
Tag::Link(_type, _url, _title) => {} //todo!("link"), //inliners.push(tag),
Tag::Image(_type, _url, _title) => todo!("image"), //inliners.push(tag),
};
None
}
Event::Text(t) => {
let t = t.as_ref();
// StyledStr can only define a single style, just take last inline container
match inliners.last() {
None => Some(quote!(text.none(#t);)),
Some(Tag::Strong) => Some(quote!(text.literal(#t);)),
Some(Tag::Emphasis) => Some(quote!(text.italic(#t);)),
_ => todo!(),
}
}
Event::End(tag) => match &tag {
// only got twenty dollars in my pocket...
Tag::Paragraph => {
assert_eq!(blocking.pop(), Some(tag));
if state == State::Short {
state = State::Long;
}

Some(quote!(text.none("\n\n");))
}
Tag::Heading(_level, _fragment, _classes) => {
assert_eq!(blocking.pop(), Some(tag));
state = State::Man;

Some(quote!(text.none("\n\n");))
}
Tag::BlockQuote => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::CodeBlock(_kind) => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::List(_start) => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::Item => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::FootnoteDefinition(_label) => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::Table(_alignment) => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::TableHead => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::TableRow => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::TableCell => {
assert_eq!(blocking.pop(), Some(tag));
None
}
Tag::Emphasis => {
assert_eq!(inliners.pop(), Some(tag));
None
}
Tag::Strong => {
assert_eq!(inliners.pop(), Some(tag));
None
}
Tag::Strikethrough => {
assert_eq!(inliners.pop(), Some(tag));
None
}
Tag::Link(_type, _url, _title) => {
//assert_eq!(inliners.pop(), Some(tag));
None
}
Tag::Image(_type, _url, _title) => {
assert_eq!(inliners.pop(), Some(tag));
None
}
},
Event::Code(t) => {
let t = t.as_ref();
Some(quote!(text.code(#t);))
}
Event::Html(_) => {
todo!("drop or panic?")
}
Event::FootnoteReference(_) => {
todo!("can this be handled? just leave the markdown as-is?")
}
// single line breaks within paragraphs
Event::SoftBreak => Some(quote!(text.none(" ");)),
// double line breaks between paragraphs
// TODO: peek into the parser to check if there's more content coming to avoid adding
// blank lines at the end.
Event::HardBreak => Some(quote!(text.none("\n\n");)),
Event::Rule => {
todo!("would need terminal width for this? is there any sort of responsive way to do this?")
}
Event::TaskListMarker(checked) => {
let _marker = if checked { '☑' } else { '☐' };
None
}
};

if let Some(chunk) = chunk {
match state {
State::Short => {
short.extend(chunk.clone());
long.extend(chunk.clone());
man.extend(chunk);
}
State::Long => {
long.extend(chunk.clone());
man.extend(chunk);
}
State::Man => {
man.extend(chunk);
}
}
}
}

let short_name = format_ident!("{}", name);
let long_name = format_ident!("long_{}", name);

let short_about = if short.is_empty() {
None
} else {
let text_block = quote! {
{
let mut text = clap::builder::StyledStr::new();
#short
text
}
};
Some(Method::new(short_name, text_block))
};

let long_about = if long.is_empty() {
None
} else {
let text_block = quote! {
{
let mut text = clap::builder::StyledStr::new();
#long
text
}
};
Some(Method::new(long_name, text_block))
};

(short_about, long_about)
}

#[test]
fn md() {
let inp = r##"
This is the __short__ desciption.

This is the *long* description, it contains an [inline link](rust-lang.org).
"##;
let _ = r##"

It also contains a [ref link].

# Examples

```
frobulate compular
```

[ref link]: https://github.com/clap-rs/clap

Okay.

"##;

// mangle input to match how we'd normally get it
let lines: Vec<String> = inp.lines().map(|l| format!(" {}", l)).collect();

let tokens = dbg!(process_md_doc_comment(lines.clone(), "frobulate"));
}

pub fn process_doc_comment(
lines: Vec<String>,
name: &str,
Expand Down
2 changes: 1 addition & 1 deletion clap_derive/src/utils/mod.rs
Expand Up @@ -3,7 +3,7 @@ mod spanned;
mod ty;

pub use self::{
doc_comments::process_doc_comment,
doc_comments::{process_doc_comment, process_md_doc_comment},
spanned::Sp,
ty::{inner_type, is_simple_ty, sub_type, subty_if_name, Ty},
};