Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance - Drastically improve worst case regex performance #51

Merged
merged 3 commits into from Oct 28, 2023

Conversation

adam-arthur
Copy link
Contributor

@adam-arthur adam-arthur commented Jan 16, 2023

Problem:
Very poor performance when testing certain regexes due to unnecessary conversion to/from string/array.

Impact:
Some of the reporters in Vitest that depend on this code are extremely slow. (See: vitest-dev/vitest#2602)

Notes:
While time complexity of both operations is theoretically the same, seems the VM is optimizing string specific operations much better than the conversion back/forth.

Future:
In this case can run a single global regex on the entire string up front rather than re-allocating a new substring at each index. But for now this is simple and solves the issue.

@adam-arthur
Copy link
Contributor Author

adam-arthur commented Jan 16, 2023

Please see for context:
vitest-dev/vitest#2602

Pasted Graphic

isInsideEscape = false;

@adam-arthur
Copy link
Contributor Author

CC @sindresorhus

index.js Outdated
@@ -42,7 +42,7 @@ const wrapWord = (rows, word, columns) => {

if (ESCAPES.has(character)) {
isInsideEscape = true;
isInsideLinkEscape = characters.slice(index + 1).join('').startsWith(ANSI_ESCAPE_LINK);
isInsideLinkEscape = word.slice(index + 1).startsWith(ANSI_ESCAPE_LINK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.slice() operates on code units while the [...word] splits on codepoints. That's a subtle difference.

> '🌍'.slice(0, 2).length
< 2
> [...'🌍'].length
< 1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, not too familiar with unicode/string particulars.

Will add a test to this effect and update PR

Converting from array to string repeatedly in a loop leads to
very poor performance in some contexts.
@adam-arthur
Copy link
Contributor Author

adam-arthur commented Jan 18, 2023

Updated PR to respect unicode character lengths

@sindresorhus
Copy link
Member

Will add a test to this effect and update PR

Missing the test

@moosemanf
Copy link

I would argue the test is "missing" bc basically every other test confirms the correct behaviour, isn't it? @sindresorhus

@gtm-nayan
Copy link

gtm-nayan commented May 27, 2023

My profiler shows me that a significant amount of time is also spent in the ansiRegex() call inside strip-ansi. stripAnsi is being called for every "word" and each time it creates a new regex object. Might want to looks there as well, creating the regex in the module scope and then resetting its lastIndex before each use sounds like a fairly low hanging fruit.

@gtm-nayan
Copy link

Completely tangential but @adam-arthur can you please tell me what tool you're using for the per-line timings in #51 (comment) ?

@sindresorhus
Copy link
Member

Bump :)

@sindresorhus sindresorhus merged commit d989bc4 into chalk:main Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants