Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default configuration is unsafe, contrary to documentation #295

Closed
djramones opened this issue Jun 29, 2023 · 4 comments · Fixed by #296
Closed

Default configuration is unsafe, contrary to documentation #295

djramones opened this issue Jun 29, 2023 · 4 comments · Fixed by #296

Comments

@djramones
Copy link
Contributor

djramones commented Jun 29, 2023

I'm using markdown-it-py on a new Django project and it's perfect, but I think the claims regarding security in the docs are quite misleading.

Both the PyPI project description and readthedocs home page say that this parser-renderer is "safe by default", but it looks like those statements were just copied over verbatim from the original JS project. The dedicated page on security further says HTML is disabled by default as a blanket protection, but on closer inspection that actually describes the original JavaScript markdown-it project, not this Python port, which switches the default configuration to unsafe full CommonMark compliance (meaning, HTML parsing is enabled, potentially allowing for injection-style attacks when used in web apps).

#56 is proof of this issue, as the reporter said "I wanted to go for markdown-it for security stance" but the code snippets show usage of the Python package without any deliberate configuration for safer parsing. To be fair, that is easy: just pass "js-default" to the constructor. But the security page in the docs does not mention this is required to actually get the benefits, and all the "safe by default" claims imply that no such effort is needed at all.

This is at least more of a documentation issue than an actual vulnerability in the code, and I can submit a PR for the necessary edits, but I'm not sure if I'm missing something. Why was the default config switched to full CommonMark compliance in the first place? The docs imply that this Python port aims to minimize differences with the JS version otherwise. Perhaps the motivations for the port mean that security is less of a priority? (That seems to be the case with Executable Books which appears to be concerned more with data science tooling than public-facing web applications, but I'm not too familiar with the organization.) If so, that is okay, but these priorities and design principles should be reflected accurately in the project pages and docs.

(I'm sorry in case my tone comes across as not nice; I'm actually grateful for this project! I just think the issue needs a bit more attention, and I believe in the importance of good documentation.)

@welcome
Copy link

welcome bot commented Jun 29, 2023

Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.

If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).

Welcome to the EBP community! 🎉

@chrisjsewell
Copy link
Member

Heya, yeh it was just pragmatic really, that these days CommonMark is the standard that most people would expect to use.

But indeed happy to receive a PR to improve the documentation to note how to set "safe mode"

@rlidwka
Copy link

rlidwka commented Jul 5, 2023

Heya, yeh it was just pragmatic really, that these days CommonMark is the standard that most people would expect to use.

Do people expect to write actual HTML in their markdown?

If they do, maybe sanitizing it by default (like comrak library in rust does, or github's cmark-gfm) might be a better option.

@chrisjsewell
Copy link
Member

Do people expect to write actual HTML in their markdown?

Oh absolutely, if your use case is e.g. for writing your own documentation, blogs etc, then it is common that you want to "extend" Markdown, perhaps to improve the visual representation of your content.
In some respect this is essentially what https://mdxjs.com is and, as I mention in markdown-it/markdown-it#936, you may even essentially want to "embed" Markdown in HTML.

In this use-case, when you have a trusted source for the Markdown (yourself), then you don't really care about security.

I would think the use-case for security is when you are rendering Markdown from potentially "untrusted" sources.

Indeed, for GFM, part of the spec is https://github.github.com/gfm/#disallowed-raw-html-extension-, which is a santizer.
It's on the TODO to add for markdown-it.rs/markdown-it-pyrs (markdown-it-rust/markdown-it-plugins.rs#15), and yeh should probably add it here as well to the "gfm-like" config preset.

I think "commonmark" should remain exactly matching its spec, and then obviously the great thing of markdown-it, is that people can add their own plugins/rules to turn-off html parsing entirely, or perform sanitization as they please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants