Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stringify_string performance #67

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sapphi-red
Copy link

This PR improves stringify_string performance.

Description

The ECMAScript spec specifies that JSON.stringify will escape ", \\, \n, \r, \t, \b, \f, and chars less than space.
https://tc39.es/ecma262/multipage/structured-data.html#sec-quotejsonstring
This is also specified in the ES5.1 so the compatibility should be fine.

benchmark numbers

test case This PR Alternative (1) Alternative (2) Original Diff
(This PR / Original)
'long'.repeat(1000) 290k 100k 290k 68k 4.26x
'long2\n'.repeat(1000) 59k 59k 25k 33k 1.79x
'"\n"'.repeat(1000) 68k 68k 12k 56k 1.21x
'<'.repeat(1000) 33k 33k 36k 170k 0.19x
'long'.repeat(1000) + '\n' 150k 100k 140k 68k 2.21x

(ops/s)

https://jsbench.me/25lm0vndix/1

This PR degrades performance on the '<'.repeat(1000) case but I assume a string like that won't appear frequently.
I guess this PR would only be slow if a string contained </\u2028/\u2029 many times.

Comment on lines +56 to 58
if (!escape_chars.test(str)){
return `"${str}"`;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this part, some test fails. But the reason is because JSON.stringify escapes more characters and uses lower case escapes (e.g. \udc00 instead of \uDC00). There should be no essential problem.
The generated string would be slightly longer but I guess that's fine because in the previous version of the tests, it was expecting the escaped ones (changed by 5530e84#diff-a561630bb56b82342bc66697aee2ad96efddcbc9d150665abd6fb7ecb7c0ab2f).

@Rich-Harris
Copy link
Owner

Thank you! The performance is very dependent on the inputs. If I try shorter strings, this PR makes things much slower: https://jsbench.me/9zlv6vfa6c/2. I'm not sure if there's a way to get the best of both worlds (without introducing a ton of complexity) but I'm wary of accidentally making things slower for a lot of people.

On the flip side of course we probably need to care about making existing slower cases faster more than we need to care about regressions in existing fast cases. But it would be good to have a more concrete grasp of the trade-offs as they relate to typical workloads.

@sapphi-red
Copy link
Author

That's a good point. I've took some benchmarks with different input lengths. If the input length is longer than 100, the new algorithm is faster than the original one (except for the input that only contains <).

benchmark results
┌─────────┬────────────────────────┬──────────────┬───────────────┬────────────────┐
│ (index) │       Task Name        │   Original   │      New      │ New / Original │
├─────────┼────────────────────────┼──────────────┼───────────────┼────────────────┤
│    0    │  '[0] 10 characters'   │ '9535319.54' │ '12299074.83' │     '1.29'     │
│    1    │  '[0] 100 characters'  │ '2799097.47' │ '6668595.40'  │     '2.38'     │
│    2    │ '[0] 1000 characters'  │ '341226.91'  │ '1259122.50'  │     '3.69'     │
│    3    │ '[0] 10000 characters' │  '32842.57'  │  '136339.86'  │     '4.15'     │
│    4    │  '[1] 10 characters'   │ '7287771.03' │ '4057035.29'  │     '0.56'     │
│    5    │  '[1] 100 characters'  │ '854148.41'  │ '1775371.66'  │     '2.08'     │
│    6    │ '[1] 1000 characters'  │ '105528.82'  │  '433440.77'  │     '4.11'     │
│    7    │ '[1] 10000 characters' │  '10819.11'  │  '46874.87'   │     '4.33'     │
│    8    │  '[2] 10 characters'   │ '6486978.52' │ '3936226.66'  │     '0.61'     │
│    9    │  '[2] 100 characters'  │ '907248.40'  │ '1639740.08'  │     '1.81'     │
│   10    │ '[2] 1000 characters'  │  '97387.63'  │  '379195.26'  │     '3.89'     │
│   11    │ '[2] 10000 characters' │  '9879.71'   │  '40203.89'   │     '4.07'     │
│   12    │  '[3] 10 characters'   │ '8355023.75' │ '2538670.52'  │     '0.30'     │
│   13    │  '[3] 100 characters'  │ '1021730.43' │  '631688.51'  │     '0.62'     │
│   14    │ '[3] 1000 characters'  │ '103162.00'  │  '91531.07'   │     '0.89'     │
│   15    │ '[3] 10000 characters' │  '10626.92'  │   '9776.62'   │     '0.92'     │
│   16    │  '[4] 10 characters'   │ '7364233.08' │ '1620777.40'  │     '0.22'     │
│   17    │  '[4] 100 characters'  │ '1617177.35' │  '310815.69'  │     '0.19'     │
│   18    │ '[4] 1000 characters'  │ '171385.70'  │  '36995.96'   │     '0.22'     │
│   19    │ '[4] 10000 characters' │  '16868.77'  │   '3176.59'   │     '0.19'     │
└─────────┴────────────────────────┴──────────────┴───────────────┴────────────────┘

In my workload, the input string doesn't contain <, so I'm fine with using the original algorithm even if the input string only contains a single <.

const re = /[<\u2028\u2029]/

export function stringify_string(str) {
  if (str.length < 100 || re.test(str)) {
    return original_stringify_string(str)
  }
  return new_stringify_string(str)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants