Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): don't parse 'requires' from strings or comments or regexes #2393

Draft
wants to merge 9 commits into
base: 0.21
Choose a base branch
from

Conversation

markwhitfeld
Copy link

@markwhitfeld markwhitfeld commented Apr 29, 2022

This resolves issue #2392.
During this PR I push the commit with the failing tests to showcase the issue, and then I push the commit to fix the issue. The initial issue was specific to minified files as described in the original issue, but as I dug deeper I found many edge cases that have been plaguing this feature previously. For example:

The initial code that excluded these checks from minified files, just for comments is here, but in a later commit the code checking strings was moved into this block as well. This was potentially done erroneously because comments are usually removed from minified files, but strings still remain in the code. My initial attempt to fix my issue just moved the string check out of the block which was skipped for minified files, but later my solution evolved as described below.

Update:
I eventually realised that the existing approach could be simplified by using a regex that would handle the mutual exclusivity of the different ways that JS code can have text that should not be parsed. This new approach has resolved the original issue as well as a number of known issues documented in the tests.
This approach covers what was attempted (and abandoned) with PR#326 for tokenisation, but leverages the mutual exclusivity inside the regex to optimise the matching.

I updated the benchmark code and found that the new way is more than twice as fast!

  • Old way: 144ms for 1000 runs (on my machine)
  • New way: 64ms for 1000 runs (on my machine)

@markwhitfeld markwhitfeld force-pushed the fix/dont-parse-requires-in-strings branch 3 times, most recently from 21a6cde to f6eb7b4 Compare April 29, 2022 15:24
@markwhitfeld
Copy link
Author

NodeJS 4 tests are making me sad 😢. Sorry about the multiple force pushes.

This is a simple bump to check if the node 4 run is currently broken through deps packages that don't support node 4
@markwhitfeld markwhitfeld force-pushed the fix/dont-parse-requires-in-strings branch from 1e02af2 to 7e49ff5 Compare April 29, 2022 16:22
@markwhitfeld
Copy link
Author

Ok, looks like node 4 is actually broken at the v0.21.6 head. Removing node 4 from CI build.

Node 4 is broken due to deps that no longer support it.
It is also way out of LTS.
@markwhitfeld
Copy link
Author

Bug demonstrated here as can be seen in the failing tests added in the commit above:
https://app.travis-ci.com/github/systemjs/systemjs/jobs/568734711#L386
image

@markwhitfeld
Copy link
Author

markwhitfeld commented Apr 29, 2022

The commit above adds tests to prove that string literals are also not handled correctly and then that is is resolved by the improved regex approach.
See test output: https://app.travis-ci.com/github/systemjs/systemjs/jobs/568746162#L386
image

@markwhitfeld markwhitfeld marked this pull request as ready for review April 29, 2022 23:01
@markwhitfeld
Copy link
Author

Ok, I'm sure happy with this now.
Better regex, simpler code, handles more cases as well as previous know issues.

@markwhitfeld
Copy link
Author

@joeldenning I know that the 0.21.x branch is real legacy stuff, but hopefully this PR is clear enough for you to easily review and release a new 0.21.7 version with this fix.

@markwhitfeld markwhitfeld changed the title fix(core): don't parse requires in strings in minified files fix(core): don't parse 'requires' from strings or regexes Apr 30, 2022
@markwhitfeld
Copy link
Author

Hooray! New approach is also more than twice as fast too!
image

@markwhitfeld markwhitfeld changed the title fix(core): don't parse 'requires' from strings or regexes fix(core): don't parse 'requires' from strings or comments or regexes Apr 30, 2022
@markwhitfeld markwhitfeld marked this pull request as draft May 2, 2022 19:42
@markwhitfeld
Copy link
Author

markwhitfeld commented May 2, 2022

I have picked up an issue with the regex detection regex.
This article details why this is hard:
https://localcoder.org/when-parsing-javascript-what-determines-the-meaning-of-a-slash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant