Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTML escaping performance #5813

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jlennox
Copy link

@jlennox jlennox commented Feb 4, 2024

I noticed escape.js used an object lookup. Since the object is static and small it would in theory be notably faster to unroll this.

https://jsperf.app/gunane/1/preview

image

underscoreEscape is the code as it is now in main. ifEscapeWithTest is as it is in this PR. Per that test run, that should make this implementation ~14% faster in my setup. The improvements were more notable on much shorter input.

I left in the .test call but I am not sure why it's present. I assume it's to reduce allocations? It's not free but in my testing it only adds an additional "1%" (ifEscape is ifEscapeWithTest but without that) even though its positive path is not hit in my benchmarking.

I noticed my random sampling of files showed they used 4 space indentation, but I followed https://github.com/lodash/lodash/blob/main/.github/CONTRIBUTING.md which said to use 2 space indentation.

Code from benchmark:

/** Used to map characters to HTML entities. */
const htmlEscapes = {
  '&': '&amp',
  '<': '&lt',
  '>': '&gt',
  '"': '&quot',
  "'": '&#39'
};

const reUnescapedHtml = /[&<>"']/g
const reHasUnescapedHtml = RegExp(reUnescapedHtml.source)


function underscoreEscape(string) {
    return (string && reHasUnescapedHtml.test(string))
        ? string.replace(reUnescapedHtml, (chr) => htmlEscapes[chr])
        : string
}

function switchEscape(string) {
  if (!string) return string;

    return string.replace(reUnescapedHtml, (chr) => {
        switch (chr) {
            case '&': return '&amp';
            case '<': return '&lt';
            case '>': return '&gt';
            case '"': return '&quot';
            case "'": return '&#39';
        }
        return chr;
    });
}

function ifEscape(string) {
    if (!string) return string;

    return string.replace(reUnescapedHtml, (chr) => {
        if (chr === '&') return '&amp';
        if (chr === '<') return '&lt';
        if (chr === '>') return '&gt';
        if (chr === '"') return '&quot';
        if (chr === "'") return '&#39';
        return chr;
    });
}

function ifEscapeWithTest(string) {
    if (!string || !reHasUnescapedHtml.test(string)) return string;

    return string.replace(reUnescapedHtml, (chr) => {
        if (chr === '&') return '&amp';
        if (chr === '<') return '&lt';
        if (chr === '>') return '&gt';
        if (chr === '"') return '&quot';
        if (chr === "'") return '&#39';
        return chr;
    });
}


function manualConcatEscape(string) {
    if (!string) return string;

    let newstring = "";
    for (let i = 0; i < string.length; ++i) {
    const chr = string[i];
        if (chr === '&') { newstring += '&amp'; continue; }
        if (chr === '<') { newstring += '&lt'; continue; }
        if (chr === '>') { newstring += '&gt'; continue; }
        if (chr === '"') { newstring += '&quot'; continue; }
        if (chr === "'") { newstring += '&#39'; continue; }
  
        newstring += chr;
    }
    return newstring;
}

const input = `<code><span class="hljs-keyword">if</span> (<span class="hljs-title function_">ifEscape</span>(input) !== expected) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-title class_">Error</span>(<span class="hljs-string">"ifEscape failed"</span>);</code>`;

const expected = "&ltcode&gt&ltspan class=&quothljs-keyword&quot&gtif&lt/span&gt (&ltspan class=&quothljs-title function_&quot&gtifEscape&lt/span&gt(input) !== expected) &ltspan class=&quothljs-keyword&quot&gtthrow&lt/span&gt &ltspan class=&quothljs-keyword&quot&gtnew&lt/span&gt &ltspan class=&quothljs-title class_&quot&gtError&lt/span&gt(&ltspan class=&quothljs-string&quot&gt&quotifEscape failed&quot&lt/span&gt);&lt/code&gt";


if (underscoreEscape(input) !== expected) throw new Error("underscoreEscape failed");
if (switchEscape(input) !== expected) throw new Error("switchEscape failed");
if (ifEscape(input) !== expected) throw new Error("ifEscape failed");
if (manualConcatEscape(input) !== expected) throw new Error("manualConcatEscape failed");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant