Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven .md vs. .mdx build times #22521

Closed
johnayoung opened this issue Mar 24, 2020 · 11 comments
Closed

Uneven .md vs. .mdx build times #22521

johnayoung opened this issue Mar 24, 2020 · 11 comments
Labels
stale? Issue that may be closed soon due to the original author not responding any more. topic: performance Related to runtime & build performance topic: remark/mdx Related to Markdown, remark & MDX ecosystem type: bug An issue or pull request relating to a bug in Gatsby

Comments

@johnayoung
Copy link

johnayoung commented Mar 24, 2020

Description

We run a documentation website that has hit a performance bottleneck at around 1000 pages. This led us to test the difference between gatsby-transformer-remark and gatsby-plugin-mdx to compare .md and .mdx build times.

We realize that these are not the same plugin, but our expectations were that the build times of each would be closer in-line with one another (for the exact same files).

We used the following repo to benchmark results: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx

Here were the results of our test using the auto generated files:

Source and Transform Nodes
# of Pages md mdx
100 0.17s 3.12s
1000 0.90s 23.05s
8000 5.53s 192.80s

Steps to reproduce

GitHub repo: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx

For markdown files:

  1. Make sure line 54 of md.generate.js is set to create .md
  2. Make sure gatsby-transformer-remark is used (and not gatsby-plugin-mdx) in the gatsby-config.js file
  3. Run npm run bench or yarn run bench

For mdx files:

  1. Make sure line 54 of md.generate.js is set to create .mdx
  2. Make sure gatsby-plugin-mdx is used (and not gatsby-transformer-remark) in the gatsby-config.js file
  3. Run npm run bench or yarn run bench

Expected result

Results of the gatsby build process for .md and .mdx files should be within reasonable distance of one another.

Actual result

Build times of .mdx were between 18 and 34 times longer for the source and transform nodes step vs. .md files.

Environment

System:
OS: Windows 10
CPU: (16) x64 Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
Binaries:
Yarn: 1.18.0 - C:\Program Files (x86)\Yarn\bin\yarn.CMD
npm: 6.9.0 - C:\Program Files\nodejs\npm.CMD
Languages:
Python: 2.7.15 - /c/Users/JYoun/.windows-build-tools/python27/python
Browsers:
Edge: 44.18362.449.0
npmPackages:
gatsby: ^2.19.5 => 2.19.45
gatsby-image: ^2.2.39 => 2.2.44
gatsby-plugin-benchmark-reporting: * => 0.0.13
gatsby-plugin-feed: ^2.3.26 => 2.3.29
gatsby-plugin-google-analytics: ^2.1.34 => 2.1.38
gatsby-plugin-manifest: ^2.2.38 => 2.2.48
gatsby-plugin-mdx: ^1.0.83 => 1.0.83
gatsby-plugin-offline: ^3.0.32 => 3.0.41
gatsby-plugin-react-helmet: ^3.1.21 => 3.1.24
gatsby-plugin-sharp: ^2.4.0 => 2.4.13
gatsby-plugin-typography: ^2.3.21 => 2.3.25
gatsby-remark-copy-linked-files: ^2.1.36 => 2.1.40
gatsby-remark-images: ^3.1.42 => 3.1.50
gatsby-remark-prismjs: ^3.3.30 => 3.3.36
gatsby-remark-responsive-iframe: ^2.2.31 => 2.2.34
gatsby-remark-smartypants: ^2.1.20 => 2.1.23
gatsby-source-filesystem: ^2.1.46 => 2.1.56
gatsby-transformer-remark: ^2.6.48 => 2.6.59
gatsby-transformer-sharp: ^2.3.13 => 2.3.19

@johnayoung johnayoung added the type: bug An issue or pull request relating to a bug in Gatsby label Mar 24, 2020
@pvdz pvdz added the scaling label Mar 24, 2020
@pvdz pvdz self-assigned this Mar 24, 2020
@pvdz
Copy link
Contributor

pvdz commented Mar 24, 2020

Hi. Thanks for flagging it. Markdown is a concern of mine in general but this mdx hockey sticking is pretty bad. We'll have to take a look at it sooner or later.

@johnayoung
Copy link
Author

@pvdz Thank you so much for hopping on. I am happy to assist in any way that I can, as this interaction is pretty critical for us moving forward.

We currently have ~1,000 pages and another ~9,000 queued up for when we can fix the build time issue.

@pvdz
Copy link
Contributor

pvdz commented Mar 24, 2020

@johno is investigating 💜

@johnayoung
Copy link
Author

@pvdz Great, thank you so much.

@johno Let me know any way I can assist.

johno added a commit to johno/gatsby that referenced this issue Mar 25, 2020
Since getSourcePluginsAsRemarkPlugins is called on
every MDX file, the getNodes call became expensive
at scale with thousands of MDX files.

This was encountered when attempting to find
underlying issues for gatsbyjs#22521. @pvdz had a hunch that
node traversal may be a culprit. This doesn't fully
address all underlying performance issues but
is a quick win.

8k MDX pages: 440s => 350s
16k MDX pages: 1100s => 750s
gatsbybot pushed a commit that referenced this issue Mar 25, 2020
…22555)

Since getSourcePluginsAsRemarkPlugins is called on
every MDX file, the getNodes call became expensive
at scale with thousands of MDX files.

This was encountered when attempting to find
underlying issues for #22521. @pvdz had a hunch that
node traversal may be a culprit. This doesn't fully
address all underlying performance issues but
is a quick win.

8k MDX pages: 440s => 350s
16k MDX pages: 1100s => 750s
@johnayoung
Copy link
Author

@johno Thanks for the updates here. Here is the second round of testing after pulling down the new packages (benchmark repo: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx):

Source and Transform Nodes
# of Pages mdx (pre-patch) mdx (post-patch) dif
100 3.12 3.7 0.58
1000 23.05 24.3 1.25
8000 192.8 183.4 -9.4

Looks like ~5% better performance in the upper bucket, with pretty much even performance in the lower buckets.

If you guys need me to test anything additional, or hunches, happy to do so.

@pvdz
Copy link
Contributor

pvdz commented Mar 25, 2020

@johnatspreadstreet If you want, you can take the code that the patch changed and follow it. It's more than likely that something is looping over those nodes, like a .map or .forEach or even just a regular for. I suspect something will be obvious in that area quite quickly. Othewise we'll get to it soon enough.

@johno
Copy link
Contributor

johno commented Mar 25, 2020

Here's where I'm at so far: yesterday I ran benchmarks and could confirm I'm seeing the same growth in time during sourcing/transforming nodes. Each time I double the number of MDX pages I see ~2x increase in time in "source and transform nodes". It appears to be continually growing, too. 🏒

Files Total build Source and transform nodes
500 30s 13s
1k 51s 23s (+1.7x)
2k 90s 47s (+2x)
4k 180s 100s (+2x)
8k 440s 230s (+2.3x)
16k 1100s 530s (+2.3x)

After some digging I found a usage of a manual node filter for type using getNodes rather than getNodesByType. This, unsurprisingly, caused a lot of unnecessary node traversal at scale. That fix (in #22555) saw a ~30% reduction in build times for 16k pages (using my benchmark site so YMMV).

My benchmark results after the change

Files Total build Source and transform nodes
8k 350s 150s
16k 750s 400s

I'll keep digging into this as I get time and will report back any additional findings and performance improvements as we get them PRed in. I suspect there's a lot more low hanging fruit.


It's also important to note that MDX will always be quite a bit slower than MD since it's doing a lot more under the covers, but it definitely shouldn't be 📈🙀

@johnayoung
Copy link
Author

@johno That is very good context, thank you much.

Is there any documentation you can point me towards that actually covers the process of converting mdx -> what is needed to display in React?

@mukul-rathi
Copy link
Contributor

mukul-rathi commented May 16, 2020

This might provide additional context: for large mdx posts, I get the following error:

 ERROR 
[BABEL] Note: The code generator has deoptimised the styling of undefined as it exceeds the max of 500KB.

warn Query takes too long:
...

This leads to a significant performance hit - the kind mentioned by the OP. I verified this on the official gatsby-starter-blog starter (after modifying it to use MDX rather than md).

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 3, 2021
@LekoArts LekoArts added topic: remark/mdx Related to Markdown, remark & MDX ecosystem topic: performance Related to runtime & build performance and removed topic: MDX labels May 28, 2021
@github-actions
Copy link

github-actions bot commented Jun 7, 2021

Hey again!

It’s been 60 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.
Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to comment on this issue or create a new one if you need anything else.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

@github-actions github-actions bot closed this as completed Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? Issue that may be closed soon due to the original author not responding any more. topic: performance Related to runtime & build performance topic: remark/mdx Related to Markdown, remark & MDX ecosystem type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

No branches or pull requests

5 participants