Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rewrite tocObj function #36

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

liby
Copy link
Owner

@liby liby commented Aug 22, 2022

08-23 17:47 Update

Try nipper, scraper, visdom, scraper is the best.

test toc_obj::tests::bench_nipper  ... bench:      21,373 ns/iter (+/- 1,685)
test toc_obj::tests::bench_scraper ... bench:      15,176 ns/iter (+/- 1,974)
test toc_obj::tests::bench_visdom ... bench:      21,570 ns/iter (+/- 1,178)

test result: ok. 0 passed; 0 failed; 1 ignored; 3 measured; 0 filtered out; finished in 10.18s

But... 🥲

Running "mini fixture" suite...
Progress: 33%

  hexo-util-rs-buffer:
    38 004 ops/s, ±1.35% 
Progress: 67%

  hexo-util-rs-buffer:
    38 004 ops/s, ±1.35% 

  hexo-util-rs:
    37 912 ops/s, ±0.55% 
Progress: 100%

  hexo-util-rs-buffer:
    38 004 ops/s, ±1.35%   | 41.52% slower

  hexo-util-rs:
    37 912 ops/s, ±0.55%   | slowest, 41.66% slower

  hexo-util:
    64 989 ops/s, ±1.66%   | fastest

Finished 3 cases!
  Fastest: hexo-util
  Slowest: hexo-util-rs
Running "large fixture" suite...
Progress: 33%

  hexo-util-rs-buffer:
    1 877 ops/s, ±1.01% 
Progress: 67%

  hexo-util-rs-buffer:
    1 877 ops/s, ±1.01% 

  hexo-util-rs:
    1 918 ops/s, ±0.45% 
Progress: 100%

  hexo-util-rs-buffer:
    1 877 ops/s, ±1.01%   | slowest, 39.88% slower

  hexo-util-rs:
    1 918 ops/s, ±0.45%   | 38.57% slower

  hexo-util:
    3 122 ops/s, ±0.47%   | fastest

Finished 3 cases!
  Fastest: hexo-util
  Slowest: hexo-util-rs-buffer

08-23 00:05

Running "mini fixture" suite...

  hexo-util-rs-buffer:
    39 091 ops/s, ±1.94%   | slowest, 38.15% slower

  hexo-util-rs:
    41 628 ops/s, ±0.66%   | 34.14% slower

  hexo-util:
    63 205 ops/s, ±0.86%   | fastest

Finished 3 cases!
  Fastest: hexo-util
  Slowest: hexo-util-rs-buffer

Running "large fixture" suite...

  hexo-util-rs-buffer:
    349 ops/s, ±0.71%   | fastest

  hexo-util-rs:
    312 ops/s, ±0.67%   | 10.6% slower

  hexo-util:
    178 ops/s, ±0.51%   | slowest, 49% slower

Finished 3 cases!
  Fastest: hexo-util-rs-buffer
  Slowest: hexo-util

src/toc_obj.rs Outdated
) -> Result<Vec<TocObj>> {
let input = input.to_string()?;
let root = Vis::load(&input).unwrap();
let lis = root.find(r#"[id^="title"]"#);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to handle the options parameter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think querying all the parent elements should do the work, but visdom seems not exposing an interface for getting parent...
https://github.com/fefit/visdom/blob/main/src/mesdoc/interface/element.rs

Copy link
Owner Author

@liby liby Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be much slower than the current implementation:

  let headers = root.find("h1,h2,h3,h4,h5,h6");
  let result = headers
    .into_iter()
    .map(|element| {
      let id = if element.has_attribute("id") {
        element
          .get_attribute("id")
          .map(|ele| ele.to_string())
          .unwrap()
      } else {
        element
          .parent()
          .map(|element| {
            element
              .get_attribute("id")
              .map(|ele| ele.to_string())
              .unwrap()
          })
          .unwrap()
      };
  
      TocObj {
        text: element.text(),
        id,
        level: match &*element.tag_name() {
          "H1" => 1,
          "H2" => 2,
          "H3" => 3,
          "H4" => 4,
          "H5" => 5,
          "H6" => 6,
          _ => 1,
        },
      }
    })
    .collect::<Vec<TocObj>>();

Mainly because root.find("h1,h2,h3,h4,h5,h6") is slower than root.find(r#"[id^="title"]"#)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Although the test case use id=title_1_2 for easier cheerio tracking, in the real-world usage id is not required to include title.

Copy link
Owner Author

@liby liby Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see <div id="title_1_1"><h2>Title 1.1</h2></div>, in the test case, id on the parent element of the <header> tag.
Does this exist in real-world usage scenarios?

cloudflare/lol-html#140

src/toc_obj.rs Outdated Show resolved Hide resolved
src/toc_obj.rs Outdated
) -> Result<Vec<TocObj>> {
let input = input.to_string()?;
let root = Vis::load(&input).unwrap();
let lis = root.find(r#"[id^="title"]"#);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Although the test case use id=title_1_2 for easier cheerio tracking, in the real-world usage id is not required to include title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: IN PROGRESS
Development

Successfully merging this pull request may close these issues.

None yet

3 participants