New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rewrite tocObj
function
#36
base: main
Are you sure you want to change the base?
Conversation
8f1aab3
to
b8c3293
Compare
src/toc_obj.rs
Outdated
) -> Result<Vec<TocObj>> { | ||
let input = input.to_string()?; | ||
let root = Vis::load(&input).unwrap(); | ||
let lis = root.find(r#"[id^="title"]"#); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle the options
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think querying all the parent elements should do the work, but visdom
seems not exposing an interface for getting parent
...
https://github.com/fefit/visdom/blob/main/src/mesdoc/interface/element.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be much slower than the current implementation:
let headers = root.find("h1,h2,h3,h4,h5,h6");
let result = headers
.into_iter()
.map(|element| {
let id = if element.has_attribute("id") {
element
.get_attribute("id")
.map(|ele| ele.to_string())
.unwrap()
} else {
element
.parent()
.map(|element| {
element
.get_attribute("id")
.map(|ele| ele.to_string())
.unwrap()
})
.unwrap()
};
TocObj {
text: element.text(),
id,
level: match &*element.tag_name() {
"H1" => 1,
"H2" => 2,
"H3" => 3,
"H4" => 4,
"H5" => 5,
"H6" => 6,
_ => 1,
},
}
})
.collect::<Vec<TocObj>>();
Mainly because root.find("h1,h2,h3,h4,h5,h6")
is slower than root.find(r#"[id^="title"]"#)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Although the test case use id=title_1_2
for easier cheerio tracking, in the real-world usage id
is not required to include title
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see <div id="title_1_1"><h2>Title 1.1</h2></div>,
in the test case, id
on the parent element of the <header>
tag.
Does this exist in real-world usage scenarios?
b8c3293
to
ed07947
Compare
src/toc_obj.rs
Outdated
) -> Result<Vec<TocObj>> { | ||
let input = input.to_string()?; | ||
let root = Vis::load(&input).unwrap(); | ||
let lis = root.find(r#"[id^="title"]"#); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Although the test case use id=title_1_2
for easier cheerio tracking, in the real-world usage id
is not required to include title
.
ed07947
to
fccac18
Compare
dbe0ab0
to
525050b
Compare
5a8a034
to
0ac2441
Compare
08-23 17:47 Update
Try
nipper
,scraper
,visdom
,scraper
is the best.But... 🥲
08-23 00:05