Inconsistent behavior of multiple backticks #166

Ekultek · 2018-12-05T15:18:55Z

Issue

There is a reflected and/or stored xss vulnerability (depending on how the markdown is parsed from user input or from a user uploaded file) from a crafted use of backticks, in all of the following parsers:

GithubMarkdown
Markdown
MarkdownExtra

How?

The vulnerability occurs when a user crafts a malicious payload with characters before a 3 backtick wrapped payload, thus bypassing the parser escape. For example, here is an image of the payloads crafted with single, double, and triple backticks:

And here is an image of the payloads rendered:

As you can see when the payload is crafted correctly using three backticks, the parser will render it as a script, this can allow malicious individuals to render scripts within a .md file or within a text box on any platform that is using this as the markdown parser. An example of a ran script:

Impact

Doing a quick search on Github for the code that enables your parser: new \cebe\markdown\. I get this many results:

The vulnerability can be either stored using an .md file (README for example), or reflected if the markdown parser is just parsing the user input text. Malicious attackers can use this method to steal sensitive user data. For example to steal a users cookies:

This can allow serious impacts on not only the end users using the site, but the reputation of the website as well.

Proof of Concept

User input

You can use the following code for a PoC on user entered text:

<?php
/*
 * to run this PoC do the following:
 * composer require cebe/markdown "~1.2.0"
 *
 * For proof that the markdown is not sent as a script do:
 * `<script>alert(1);</script>`
 * this will output the script in a safe way.
 *
 * To make an alert do:
 * L: ```<script>alert(1);</script>```
 * this will display a rendered alert
 * */
include 'vendor/autoload.php';

function parseData($data) {
    $parserGithub = new cebe\markdown\GithubMarkdown();
    $parserMarkdown = new cebe\markdown\Markdown();
    $parserMarkdownExtra = new cebe\markdown\MarkdownExtra();
    return [
        "<div class='parsed-github'>".$parserGithub->parse($data)."</div>",
        "<div class='parsed-markdown'>".$parserMarkdown->parse($data)."</div>",
        "<div class='parsed-markdown-extra'>".$parserMarkdownExtra->parse($data)."</div>"
    ];
}

if (isset($_GET['poc'])) {
    $parsed = parseData($_GET['poc']);
    echo "<!doctype html>
<title>PoC</title>
<body>
{$parsed[0]}
{$parsed[1]}
{$parsed[2]}
</body>";
} else {
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
<form action='#'>
<label for='markdown-poc'>Markdown: </label>
<input id='markdown-poc' type='text' name='poc'>
<input type='submit' name='Submit'>
</form>
</body>";
}

MD file

And you can use the following code for a PoC on text read from an MD file:

<?php
/*
 * to use this PoC you will need composer to require the library:
 * `composer require cebe/markdown "~1.2.0"`
 *
 * After this has been done you can create an MD file anywhere on your system,
 * to verify that the parser to render the data in a safe way use `<script>alert(1);</script>`,
 * or whatever script you decide to use
 *
 * In order to get the data rendered as javascript:
 * L: ```<script>alert();</script>``` or whatever script you decide to render
 * */
include 'vendor/autoload.php';

function renderFileContent($fname) {
    return file_get_contents($fname);
}
if (isset($_POST['upload'])) {
    $tmpName = $_FILES['poc']['tmp_name'];
    $contents = renderFileContent($tmpName);
    $parserGithub = new cebe\markdown\GithubMarkdown();
    $parserMarkdown = new cebe\markdown\Markdown();
    $parserMarkdownExtra = new cebe\markdown\MarkdownExtra();
    $dataGithub = $parserGithub->parse($contents);
    $dataMarkdown = $parserMarkdown->parse($contents);
    $dataMarkdownExtra = $parserMarkdownExtra->parse($contents);
    $parsed = [
        "<div class='parsed-github'>" . $parserGithub->parse($dataGithub) . "</div>",
        "<div class='parsed-markdown'>" . $parserMarkdown->parse($dataMarkdown) . "</div>",
        "<div class='parsed-markdown-extra'>" . $parserMarkdownExtra->parse($dataMarkdownExtra) . "</div>"
    ];
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
{$parsed[0]}
{$parsed[1]}
{$parsed[2]}
</body>";
} else {
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
<form action='#' method='post' enctype='multipart/form-data'>
<span>Upload file:</span>
<input type='file' name='poc'>
<input type='submit' name='upload' value='Upload'>
</form>
</body>
";
}

The text was updated successfully, but these errors were encountered:

samdark · 2018-12-05T16:19:58Z

Markdown doesn't ensure output is secure in any way by design. It is allowing HTML so you don't need to craft it like that, just use <script directly.

samdark · 2018-12-05T16:20:31Z

I don't think escaping HTML is the job of markdown processing library.

Ekultek · 2018-12-05T16:35:38Z

@samdark then why does it escape it if the payload isnt crafted?

samdark · 2018-12-05T17:01:11Z

It's by design of markdown: https://daringfireball.net/projects/markdown/syntax#autoescape

samdark · 2018-12-05T17:03:57Z

That's expected. You should process result of markdown conversion with something like http://htmlpurifier.org/ if you want to allow users to enter text.

samdark · 2018-12-05T17:05:27Z

For example, see https://github.com/yiisoft-contrib/yiiframework.com/blob/master/widgets/views/comments.php#L44 and https://github.com/yiisoft-contrib/yiiframework.com/blob/a88510f17e6fdf225248b822022798baa679d78b/components/Formatter.php#L100

Ekultek · 2018-12-05T17:05:50Z

@samdark I shouldn't need to rely on a second library because one of them doesn't do its job properly

samdark · 2018-12-05T17:07:44Z

Please re-read markdown specification. It says explicitly that it allows any HTML by design and it's unsafe by definition to allow users to enter markdown w/o further escaping/cleanup.

samdark · 2018-12-05T17:08:13Z

It's not the job of the markdown parser to escape/cleanup output.

samdark · 2018-12-05T17:11:38Z

https://michelf.ca/blog/2010/markdown-and-xss/

Ekultek · 2018-12-05T17:16:53Z

The point of a parser is to render the data in a safe way. You're saying I should rely on a second library to render the data in a safe way, why should I have to do that when this parser is already here and should do that for me?

samdark · 2018-12-05T18:24:08Z

The point of a parser is to render the data in a safe way.

No since it's a markdown parser and markdown wasn't meant to be safe.

You're saying I should rely on a second library to render the data in a safe way

Yes.

why should I have to do that when this parser is already here and should do that for me?

This parser converts markdown to HTML. Markdown, by definition, allows any HTML and that is unsafe if you allow users to enter data you render. There are valid use cases for non-filtered HTML. For example, you can allow full markdown for admin only and that's huge flexibility. You can do things like this by embedding HTML and CSS into blog post.

samdark · 2018-12-05T18:24:50Z

@cebe I think security topic should be emphasized in readme pointing to HTMLPurifier.

Ekultek · 2018-12-05T18:46:04Z

@samdark I think you should incorporate HTMLPurifier into the code itself if that is what would be needed to fix the XSS issues in it. That would probably save you guys a lot of headache along with people like me who find bugs in stuff that they use sometimes. Thank you for your input I'll wait for @cebe

cebe · 2018-12-06T11:10:02Z

Thanks for the detailed report. As pointed out by @samdark this is due to the nature of markdown and not a bug. You may remove HTML support by creating your own Markdown flavor class, which does not render the HTML, but the safest solution is to use HTML Purifier or similar tools.

I added a section in the README about this: https://github.com/cebe/markdown/blob/master/README.md#security

Ekultek · 2018-12-06T14:10:59Z

@cebe This still does not explain why it mitigates simplistic attacks such as:

`<script>alert(1);</script>`

into:

<code>&lt;script&gt;alert(1);&lt;/script&gt;</code>

If you are saying that the design of the parser allows this, then the simplistic attacks as stated would not work.

cebe · 2018-12-06T15:16:29Z

Code tags are expected to display the raw text inside them, that means every HTML inside code tags is escaped properly. In your case above you had multiple code tags: ``` was converted to <code>`</code>, so the script tag was not inside the code tag in that case.

Ekultek · 2018-12-06T15:32:34Z

So then if that is the case, there is a bug in your code with multiple backticks.

cebe · 2018-12-06T15:59:46Z

Well, the rendering is technically correct according to the markdown spec. If you use multiple backticks you need to leave space before and after the code.

But it seems other parsers are doing it differently....

https://johnmacfarlane.net/babelmark2/?normalize=1&text=something+%60%60%60%3Cscript%3E%60%60%60

cebe · 2018-12-06T16:00:47Z

related to #99

Ekultek · 2018-12-06T16:10:25Z

I got the bug tag, never been more happy in my life lol

nunofernandes · 2019-03-20T10:07:12Z

Any idea when we will have the 1.2.1?

Thanks,

tyler-ham · 2019-07-03T19:43:59Z

I just stumbled across CVE-2018-1000874 referencing this issue and wanted to leave a note with my thoughts. I disagree with the CVE-2018-1000874 being assigned as an XSS vulnerability in cebe/markdown.

Dispute of Vulnerability CVE-2018-1000874

I agree with @samdark that it is not the responsibility of the markdown processor to perform any escaping, cleanup, or sanitization outside of anything that might be required by the markdown specification of each respective markdown flavor.

Such additional escaping, cleanup, and sanitization would be no different than expecting json_decode() to strip out <script> tags. Suppose I have stored a user's comments as {"commented_at":"2019-07-03 12:45:00","comment":"Payload: <script>alert(1);</script>"}. Nobody would expect json_decode() to prevent the XSS vulnerability that is exposed when I save unsanitized user payload and then render it out to the browser as HTML. The vulnerability lies not within the decoder/parser, but in the code that invokes the decoder/parser and then presents the result to the browser as HTML.

Asking for the markdown parser to also perform sanitization isn't necessarily an invalid request, but it is a feature request, not a vulnerability bug to fix. However, since good sanitization libraries already exist, are easy to integrate, and have no coupling with the act of parsing markdown, it seems unnecessary and unwise to build such functionality into the parser.

Backtick Parsing

I readily concede that the Traditional Markdown specifications do not unambiguously describe the case demonstrated in this issue with respect to multiple backticks, but I think the current behavior is incorrect, and I agree with @cebe's renaming of the issue from "Cross site scripting vulnerability" to "Inconsistent behavior of multiple backticks".

Traditional Markdown Specification

The relevant portion of the Traditional Markdown specification says:

To include a literal backtick character within a code span, you can use multiple backticks as the opening and closing delimiters...

It goes on to show an example that begins and ends with two backticks with a single backtick contained inside:

``There is a literal backtick (`) here.``

produces:

<p><code>There is a literal backtick (`) here.</code></p>

Interpretation

My interpretation is that "you can use multiple backticks" seems to allow any number N of backticks to be selected as the open/close delimiter and that you are not required to have a string of N-1 backticks contained within them for the N backticks to be considered open/close delimiters.

CommonMark Specification for Comparison

This interpretation is consistent with the CommonMark specification, which attempts to provide a "standard, unambiguous syntax specification for Markdown." From the Code spans section of the specification:

A backtick string is a string of one or more backtick characters (`) that is neither preceded nor followed by a backtick.

A code span begins with a backtick string and ends with a backtick string of equal length. The contents of the code span are the characters between the two backtick strings...

Result of Interpretation

The result of this interpretation is that on the respective Payload lines, `, ``, ``` in the case demonstrated in this issue should be considered open/close delimiters so that:

Payload: `<script>alert(1);</script>`

Payload: ``<script>alert(1);</script>``

Payload: ```<script>alert(1);</script>```

produces:

<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>
<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>
<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>

cebe · 2019-07-08T08:16:15Z

Hi, thanks for the detailed post. I was not aware of a CVE being assigned to this issue and I agree that this is not a security issue. The documentation in the README explains this situation in detail: https://github.com/cebe/markdown#security-considerations-

I have requested the CVE to be rejected.

tyler-ham · 2019-07-08T18:50:21Z

No problem. I also put in a request and just received this response from Mitre:

Thank you for your submission. CVE-2018-1000874 has been updated to a status of DISPUTED, with the rationale you have provided. These changes should be made public on cve.mitre.org within 1-2 hours.

billythekid · 2020-01-30T13:47:00Z

This is currently being reported by snyk It does mention it's "fixed" by way of the readme edit but I guess because these commits aren't published as a version yet it's still flagging it.

Might this get a push to a 1.2.1.1 / 1.2.2 so we can clear this up too?

Ekultek · 2020-02-01T00:46:57Z

I agree with the dispute, anyone coming here because of the CVE use a purification system before allowing anything from end users. They tend to do stupid stuff.

cebe · 2020-02-26T01:49:38Z

I have contacted snyk and asked them to remove the report.

cebe · 2020-02-26T22:00:38Z

FYI: The issue is not reported by snyk anymore.

cebe closed this as completed in a91a7cb Dec 6, 2018

cebe reopened this Dec 6, 2018

cebe changed the title ~~Cross site scripting vulnerability~~ Inconsistent behavior of multiple backticks Dec 6, 2018

cebe added the bug label Dec 6, 2018

cebe added this to the 1.2.1 milestone Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior of multiple backticks #166

Inconsistent behavior of multiple backticks #166

Ekultek commented Dec 5, 2018 •

edited

samdark commented Dec 5, 2018 •

edited

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018 •

edited

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

cebe commented Dec 6, 2018

Ekultek commented Dec 6, 2018 •

edited

cebe commented Dec 6, 2018 •

edited

Ekultek commented Dec 6, 2018 •

edited

cebe commented Dec 6, 2018

cebe commented Dec 6, 2018

Ekultek commented Dec 6, 2018

nunofernandes commented Mar 20, 2019

tyler-ham commented Jul 3, 2019

cebe commented Jul 8, 2019

tyler-ham commented Jul 8, 2019

billythekid commented Jan 30, 2020

Ekultek commented Feb 1, 2020 •

edited

cebe commented Feb 26, 2020

cebe commented Feb 26, 2020 •

edited

Inconsistent behavior of multiple backticks #166

Inconsistent behavior of multiple backticks #166

Comments

Ekultek commented Dec 5, 2018 • edited

Issue

How?

Impact

Proof of Concept

User input

MD file

samdark commented Dec 5, 2018 • edited

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018 • edited

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

samdark commented Dec 5, 2018

samdark commented Dec 5, 2018

Ekultek commented Dec 5, 2018

cebe commented Dec 6, 2018

Ekultek commented Dec 6, 2018 • edited

cebe commented Dec 6, 2018 • edited

Ekultek commented Dec 6, 2018 • edited

cebe commented Dec 6, 2018

cebe commented Dec 6, 2018

Ekultek commented Dec 6, 2018

nunofernandes commented Mar 20, 2019

tyler-ham commented Jul 3, 2019

Dispute of Vulnerability CVE-2018-1000874

Backtick Parsing

Traditional Markdown Specification

Interpretation

CommonMark Specification for Comparison

Result of Interpretation

cebe commented Jul 8, 2019

tyler-ham commented Jul 8, 2019

billythekid commented Jan 30, 2020

Ekultek commented Feb 1, 2020 • edited

cebe commented Feb 26, 2020

cebe commented Feb 26, 2020 • edited

Ekultek commented Dec 5, 2018 •

edited

samdark commented Dec 5, 2018 •

edited

samdark commented Dec 5, 2018 •

edited

Ekultek commented Dec 6, 2018 •

edited

cebe commented Dec 6, 2018 •

edited

Ekultek commented Dec 6, 2018 •

edited

Ekultek commented Feb 1, 2020 •

edited

cebe commented Feb 26, 2020 •

edited