Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measured performance decrease between major releases of Marked, time for a performance pass? #2355

Open
alystair opened this issue Jan 12, 2022 · 12 comments
Labels

Comments

@alystair
Copy link

alystair commented Jan 12, 2022

What pain point are you perceiving?
Since MarkedJS's inception most major versions have increased time needed to complete parsing. The numbers shown below were captured on second page load /w 6x CPU slowdown - only Marked version being revised, nothing else. Marked processing time has increased more than 2x since v0.8.2

  • Version 0.8.2
    image

  • Version 2.0.3
    version 2.0.3

  • Version 4.0.9
    version 4.0.9

Describe the solution you'd like
Perhaps it's time to look at an optimization pass and potential short circuits that could be taken to restore some previously lost performance? I feel performance should remain somewhat consistent...

@UziTech
Copy link
Member

UziTech commented Jan 12, 2022

It would be great to get something in the pipeline to measure performance. We tried that at one point but it wasn't reliable when run in GitHub actions.

PRs are appreciated 😁👍

@alystair
Copy link
Author

alystair commented Jan 12, 2022

Well, it's not something that has to be done often so this could be a manual operation by generating flamegraphs and attempting to optimize the functions that 'waste' the most time - just looking at the output above the lexer is a good place to start.

@UziTech
Copy link
Member

UziTech commented Jan 12, 2022

What are you using to produce those flame graphs?

@alystair
Copy link
Author

Chrome DevTools via "Performance" tab recordings

@calculuschild
Copy link
Contributor

calculuschild commented Jan 26, 2022

Is there perhaps a good standard test Markdown document that would give Marked.js a thorough performance stress test? I know we have our bench script, but that uses only our spec tests from Commonmark if I recall, which is heavily weighted toward specs with lots of odd edge cases (i.e., I think almost a fourth of our tests are for em/strong, which might disproportionately make it seem like em/strong is a bottleneck), and no tests at all for tables or other GFM features that might be contributing to slowdown as well,

Things like this tend to have large numbers of code blocks, etc., which is a similar problem, but at least it's a normal document that can be pasted into Chrome if we use their performance tab.

@alystair
Copy link
Author

Safari also has some interesting tooling but DevTools should be sufficient. You can roll your own perf using the Performance API however it of course may add a few precious ns/ms to runtime ;)

@calculuschild
Copy link
Contributor

calculuschild commented Feb 26, 2022

@UziTech Since we run each spec 1000 times in our bench test, would it be possible to modify our bench tests to track the average time taken for each spec and output which cases are slowing us down the most?

@UziTech
Copy link
Member

UziTech commented Feb 26, 2022

Ya we could bench each test individually to see which ones are slowest relative to other packages.

@calculuschild
Copy link
Contributor

calculuschild commented Feb 27, 2022

@UziTech I put together a rough version of this, but realized analyzing becomes very confusing if we are comparing 6 versions/options of Marked against Commonmark and Markdown-It. Given that, does it make sense to compare just the CJS (no GFM) version of Marked with Commonmark? If so I can make a PR.

Essentially for each spec, I found the ratio of execution times MarkedTime / CommonmarkTime and sorted those results top to bottom. The order changes a bit with each run, so I combined 4 runs in Excel to get the average performance of each spec.

The ranking is as follows (top 100 worst specs):


Key

Spec # - The commonmark spec example number.
Avg. Performance Ratio - How many times slower Marked (cjs) is than Commonmark
Avg. Rank - Position in the sorting order
Section - Commonmark spec category

Note that example 523 is consistently the worst performing (first place almost every time), however in general Em/Strong seems to be most prominent at the top.

Spec # Avg. Performance Ratio Avg. Rank Section
523 4.791666667 1.5 Links
450 4.1875 2.25 Emphasis and strong emphasis
438 4.125 2.75 Emphasis and strong emphasis
255 3.516666667 5.75 List items
279 3.776785714 5.75 List items
435 3.4375 6.75 Emphasis and strong emphasis
291 3.268181818 11.75 List items
447 3.25 12 Emphasis and strong emphasis
644 3.1875 12.5 Hard line breaks
16 3.25 12.75 Backslash escapes
66 3.338888889 13 ATX headings
645 3.083333333 13.5 Hard line breaks
629 3.1875 14.25 Raw HTML
631 3.05 16.75 Raw HTML
535 3.1 17 Links
385 3.066666667 18 Emphasis and strong emphasis
280 3.102678571 18 List items
630 3.0625 18.75 Raw HTML
627 3 20.5 Raw HTML
444 2.892857143 26.25 Emphasis and strong emphasis
342 2.875 27.25 Code spans
416 2.822916667 28.5 Emphasis and strong emphasis
650 2.9125 31.5 Textual content
611 2.75 34.25 Autolinks
614 2.791666667 35.25 Raw HTML
456 2.897321429 35.5 Emphasis and strong emphasis
256 2.694444444 38 List items
525 2.672222222 38.5 Links
59 2.725 39 Thematic breaks
322 2.69047619 39 Lists
603 2.647222222 39.5 Autolinks
387 2.75 40.25 Emphasis and strong emphasis
71 2.607638889 42.75 ATX headings
399 2.738095238 42.75 Emphasis and strong emphasis
485 2.625 44.75 Links
168 2.608333333 45.25 HTML blocks
628 2.8 48.25 Raw HTML
610 2.854166667 49.5 Autolinks
192 2.5625 50 Link reference definitions
359 2.55 52.75 Emphasis and strong emphasis
363 2.533333333 55.75 Emphasis and strong emphasis
108 2.565909091 56.75 Indented code blocks
488 2.553571429 58.25 Links
386 2.595238095 58.75 Emphasis and strong emphasis
75 2.5625 60 ATX headings
56 2.529761905 60.25 Thematic breaks
498 2.619047619 61.5 Links
298 2.488888889 61.75 List items
262 2.490909091 62 List items
391 2.488095238 63.75 Emphasis and strong emphasis
491 2.458333333 65.5 Links
616 2.45 66.25 Raw HTML
446 2.446428571 66.25 Emphasis and strong emphasis
65 2.6875 66.25 ATX headings
277 2.490909091 66.75 List items
519 2.4375 68 Links
397 2.517857143 69 Emphasis and strong emphasis
187 2.433333333 71.25 HTML blocks
276 2.45 72 List items
475 2.45 74.25 Emphasis and strong emphasis
400 2.4375 74.75 Emphasis and strong emphasis
436 2.441666667 75.25 Emphasis and strong emphasis
624 2.4 75.5 Raw HTML
453 2.416666667 76.25 Emphasis and strong emphasis
49 2.5 80.25 Thematic breaks
434 2.410714286 81 Emphasis and strong emphasis
420 2.410714286 82 Emphasis and strong emphasis
634 2.4 82.25 Hard line breaks
652 2.375 83.5 Textual content
63 2.4375 84.75 ATX headings
323 2.389423077 85.5 Lists
415 2.383928571 86.5 Emphasis and strong emphasis
325 2.376633987 87.5 Lists
580 2.35 91.5 Images
484 2.383333333 92.5 Links
258 2.371590909 94.25 List items
439 2.341666667 94.75 Emphasis and strong emphasis
476 2.458333333 95.5 Emphasis and strong emphasis
637 2.35 95.75 Hard line breaks
74 2.4375 97.25 ATX headings
109 2.338450292 97.25 Indented code blocks
266 2.375 99.25 List items
281 2.343406593 100 List items
448 2.333333333 101.25 Emphasis and strong emphasis
299 2.315705128 101.5 List items
452 2.366666667 102.5 Emphasis and strong emphasis
440 2.333333333 102.5 Emphasis and strong emphasis
518 2.3 102.75 Links
254 2.335742754 102.75 List items
458 2.345238095 103.5 Emphasis and strong emphasis
437 2.333333333 104 Emphasis and strong emphasis
284 2.3125 104 List items
216 2.322802198 104.25 Link reference definitions
301 2.335539216 105.75 Lists
270 2.329924242 106.25 List items
268 2.365079365 106.25 List items
102 2.357142857 108.25 Setext headings
307 2.296153846 110.5 Lists
390 2.321428571 111 Emphasis and strong emphasis
29 2.3 111.25 Entity and numeric character references

@alystair
Copy link
Author

Brilliant research - do the majority of these use regex? My vague hypothesis is that some regexes could be optimized to reduce the number of steps needed for matching. Using a 3rd party tool such as https://regex101.com/ (although it only works for Python/PHP) you can tell the number of steps it had to take - there's probably an equiv for JS somewhere?

@calculuschild
Copy link
Contributor

calculuschild commented Feb 28, 2022

Regex101 works fine. Even though the debugger is "only for PHP" they are nearly identical so it works well enough for tracking things down.

Regex is always a possible source of improvement, and you are right, pretty much every token uses Regex at it's core. I can already see a possible improvement in the rDelim rule for emStrong.

Feel free to poke around if you notice any potential improvements.

@alystair
Copy link
Author

alystair commented Mar 1, 2022

Wish I had the time! My personal priority is taking Marked (and other renderers such as highlight.js) out of the client critical path in my upcoming library by caching results server side where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants