Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(labs/ssr): use RegExp.exec to escape HTML #4627

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

43081j
Copy link
Collaborator

@43081j 43081j commented Apr 17, 2024

Switches from using replace(pattern, fn) to using the exec method of a RegExp instead.

This is draft until someone can decide if its a sensible thing to do 馃槵

basically, i noticed one of the slow downs compared to other libraries in SSR was this function. it seems replace(pattern, fn) is much slower than manually iterating over matches.

I've tried 3 different algorithms and loosely benchmarked them.

Algorithm 1 (current)

  1. Return str.replace(pattern, fn) where fn returns the replacement character

Algorithm 2 (this PR)

  1. Match against the string with pattern.exec(str)
  2. If no match, return early (no special chars anywhere)
  3. Otherwise, consume the preceding text and the escaped character
  4. exec the pattern again and repeat this process until there are no matches
  5. If the last match wasn't the end of the string, consume the remaining text of the string
  6. Return the resulting new string

Algorithm 3 (React)

  1. Match against the string with pattern.exec(str)
  2. If no match, return early (no special chars anywhere)
  3. Otherwise, consume the preceding text and the escaped character
  4. Iterate through the remaining characters until we reach another special character (not using regex)
  5. Repeat from step 3
  6. Return the resulting new string once we have reached the end

Performance

I loosely benchmarked it using 3 strings:

  • "foo <div> bar bar </div> baz"
  • "foo bar baz"
  • "foo <div> some very long string which does not contain any more special chars ... "

Algorithm 3:

  • Particularly terrible with the 3rd string, since it has to iterate through all the remaining characters for no reason whereas algo 2 would've already stopped long ago (since exec would've returned null)
  • Fastest at shorter strings which contain many special chars

Algorithm 2:

  • Fastest overall

Algorithm 1:

  • Slowest overall

The early return probably helps a lot since many strings do not contain special chars.

Switches from using `replace(pattern, fn)` to using the `exec` method of
a `RegExp` instead.
Copy link

changeset-bot bot commented Apr 17, 2024

鈿狅笍 No Changeset found

Latest commit: faab271

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@justinfagnani
Copy link
Collaborator

Thanks for looking into this James!

I wonder how much of the advantage is in having the regex constant, so maybe compiled once (I thought a regex literal was, but could be wrong about that), not creating a closure for the replace call, and the object lookup for the replacement.

Maybe this will do well and still be simple:

const escapeRegex = /[&<>"']/g;
const replacer = (char) => {
 switch (char) {
  // Character: "
  case 34:
    return '&quot;';
  // Character: &
  case 38:
    return '&amp;';
  // Character: '
  case 39:
    return '&apos';
  // Character: <
  case 60:
    return '&lt;';
  // Character: >
  case 62:
    return '&gt;';
  }
}

export const escapeHtml = (str: string) => str.replace(escapeRegex, replacer);

It'd be nice to have a benchmark checked in with the various options.

@43081j
Copy link
Collaborator Author

43081j commented Apr 18, 2024

That's actually the code I benchmarked as algo 1 (not what's in source here) 馃槄 I had already moved the function and regex out after seeing the garbage collection happening

Surprisingly it was still a lot slower 馃

I'll throw together a more proper benchmark soon so you can run it too

a few more things i came across:

  • keying into an object seems slower than a switch (i.e. char = replacements[charCode])
  • iterating through the string without any regex is very slow (as in just building a string up character by character and replacing specials when you see them)
  • i haven't tried replaceAll or replacing with a string, but i assume that means we'd just be iterating over the string many times end-to-end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants