Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DOMParser with template + innerHTML to avoid memory leak #23

Open
4 tasks done
JounQin opened this issue Feb 24, 2024 · 11 comments
Open
4 tasks done

Replace DOMParser with template + innerHTML to avoid memory leak #23

JounQin opened this issue Feb 24, 2024 · 11 comments
Labels
🤞 phase/open Post is being triaged manually

Comments

@JounQin
Copy link
Member

JounQin commented Feb 24, 2024

Initial checklist

Affected packages and versions

rehype-dom-parse

Link to runnable example

No response

Steps to reproduce

https://stackoverflow.com/questions/56451731/dom-parser-chrome-extension-memory-leak

Expected behavior

No memory leak

Actual behavior

I just noticed this issue from https://twitter.com/DIYgod/status/1761065494185156663

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Feb 24, 2024
@ChristianMurphy
Copy link
Member

Thanks for raising an issue @JounQin! 🙇
Is this something you are raising because you think it may be a potential issue? Or have you seen it cause a memory leak in practice?
Would you be able to share a reproducible example of the issue in rehype-dom?
Do you believe this is a bug in rehype-dom or an issue in Chromium?


I ask because the StackOverflow issue https://stackoverflow.com/questions/56451731/dom-parser-chrome-extension-memory-leak, is all over the place.
It references Firefox (determined to be not an issue https://bugzilla.mozilla.org/show_bug.cgi?id=1646237), JSDom (resolved in jsdom/jsdom#2722), and Chromium (specific to when video is being parsed, and unconfirmed https://issues.chromium.org/issues/254330164)

The Twitter post referenced https://twitter.com/DIYgod/status/1761065494185156663

RSSHub Radar has a memory leak bug that has existed for several years. The cause has never been found. The entire warehouse code has been rewritten and it still cannot be solved. It is very painful. Today I finally caught it. The culprit is Chrome's native DOMParser.

The reference to the DOM returned by DOMParser will exist permanently in the memory and cannot be cleared manually or by GC. In the end, more and more memory will explode.

(translated to English)

Appears to come from parsing XML rather than HTML and also hasn't been reported to or confirmed by the Chromium team.

@JounQin
Copy link
Member Author

JounQin commented Feb 24, 2024

@ChristianMurphy

Is this something you are raising because you think it may be a potential issue?

Indeed.

Would you be able to share a reproducible example of the issue in rehype-dom?
Do you believe this is a bug in rehype-dom or an issue in Chromium?

I believe it should be an issue of Chrome. And we can avoid this potential issue by just replacing the usage of DOMParser included by #19.

And it's very common to use template, see also syntax-tree/hast-util-from-html-isomorphic#2

Appears to come from parsing XML rather than HTML and also hasn't been reported to or confirmed by the Chromium team.

image

It's parsing html, not xml.

@wooorm
Copy link
Member

wooorm commented Mar 6, 2024

#19 already landed. template is already used for fragments. DOMParser is used for entire documents. I don’t think there’s an alternative for that?

@JounQin
Copy link
Member Author

JounQin commented Mar 6, 2024

@wooorm

#19 is always using DOMParser, not template, sorry. 😅

Is that possible to use iframe for document instead?

I realize that we shouldn't use iframe to avoid unnecessary requests.

@wooorm
Copy link
Member

wooorm commented Mar 7, 2024

Ah, I can’t keep track.
Well, #18 explains it. Safety is more important than a bug in v8, which they should fix?

@JounQin
Copy link
Member Author

JounQin commented Mar 7, 2024

@wooorm

template will not send HTTP request at the same time? That's why I raised syntax-tree/hast-util-from-html-isomorphic#2 to discuss which API (template vs DOMParser) should be preferred, I hope we can be constant between our libraries.

@wooorm
Copy link
Member

wooorm commented Mar 7, 2024

I don’t understand what you want.
In this issue you want template.
Before you wanted DOMParser.
What do you propose?

Please keep in mind that <template> cannot parse entire documents. And DOMParser needs a small workaround for fragments. And we need to support both.

@JounQin
Copy link
Member Author

JounQin commented Mar 7, 2024

I don’t understand what you want. In this issue you want template. Before you wanted DOMParser. What do you propose?

Please keep in mind that <template> cannot parse entire documents. And DOMParser needs a small workaround for fragments. And we need to support both.

Yes, sorry for the confusion, before I know there could be a memory leak I'd prefer DOMParser, but now I'd prefer to use template for Fragment, not document.

So my proposal would be change to use template for Fragment while keep using DOMParser for document.

If you agree, I'd like to raise a PR.

@remcohaszing
Copy link
Member

Based on the last comment I’d say go for it 👍

@wooorm
Copy link
Member

wooorm commented Mar 7, 2024

I'm okay with that, if there are no HTTP requests sent for images, and scripts don't run!

@JounQin
Copy link
Member Author

JounQin commented Mar 7, 2024

Sure, that's syntax-tree/hast-util-from-html-isomorphic#2 talking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

No branches or pull requests

4 participants