Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resolution): Fix resolution of querystringed imports in ESM files #336

Closed

Conversation

lforst
Copy link

@lforst lforst commented Dec 21, 2022

Note: This PR is built on top of #322 which I took over because @lobsterkatie is currently out.


Currently, there are two problems with the way @vercel/nft handles ESM imports with querystrings (import * as something from './someFile?someQuerystring';):

  • It treats the ? as it would any other character. As a result, it fails to resolve such imports to the non-querystringed files to which they correspond. (In the above example, it looks for a file named someFile?someQuerystring.js and, as one would expect, doesn't find it.) While this is the correct behavior for require calls in CJS files, it doesn't match the way that Node handles ESM imports (which is to strip the querystring off before attempting to resolve the file).
  • In cases where a custom resolve method is provided, and that custom method does strip querystrings from ESM imports, the querystrings are then not put back on before the imported module is processed. This matters, for example, in cases where readFile is also overridden, with a method whose output is affected by the querystring's value. (Webpack is such a system, so one place this shows up is in Next.js's NextTraceEntryPointsPlugin.)

This fixes both of those problems by selectively stripping and re-adding the querystring in ESM imports at various points in the nodeFileTrace lifecycle. More specifically, in resolveDependecy and in emitDependency and its helpers, we strip the querystring (and/or don't add it back), with the exception of:

  • When we're marking a path as seen, we keep the querystring so that having a different querystring makes the process run again.
  • When we're calling readFile, we keep the querystring, since some implementations of readFile, including the one in the nft webpack plugin, read different modules out of memory with different querystrings).
  • When we're calling resolve, we add the querystring back in, since this is also something which can be supplied by the user, and and the user-supplied version might care about query strings.
  • When we're caching analysis results, we keep the querystring, since again here, we may have different code to analyze if we have a different querystring.
  • When we're making the recursive call to emitDependency, we add the querystring back in, since we need it there to be able to start the whole cycle over.

Notes:

  • Because the behavior here depends on whether or not a module is ESM, I needed to pass cjsResolve to the recursive call of emitDependency. Rather than change emitDependency's signature, I opted to make emitDependency a wrapper for an inner function (analyzeAndEmitDependency) with the updated signature. Recursive calls now go to the inner function.
  • For context, we on the @sentry/nextjs team are interested in this because we use a webpack loader to insert in the dependency tree a proxy module for each page, which allows us to automatically wrap users' exported functions (getServerSideProps and friends, API route handlers, etc). To do this, we replace the code from /path/to/pages/somePage.js with our proxy code, which includes an import from './somePage?__sentry_wrapped__'. When webpack then crawls our proxy module looking for dependencies, the querystring tricks it into thinking it hasn't yet dealt with that page, so it forwards it to our loader a second time rather than skipping it. When our loader sees the querystring, it knows the code replacement has already been done, and returns the original page code untouched, thereby preserving the continuity of the dependency tree. This fix ensures that @vercel/nft successfully traces the modified tree.
  • The unit test testing that this doesn't break the (already correct) behavior in CJS when dealing with files with question marks in their names (cjs-querystring) required a little gymnastics in order to not break CI on Windows. It turns out that skipping the test on Windows wasn't enough - the mere fact that a file exists with a ? in its name is enough to cause Windows not to even be able to check out the commit. Therefore, the file is stored without any punctuation in its name, and before the tests run on non-Windows platforms, it is copied to a version which does have the punctuation, and which is git-ignored to enforce it not ending up in the repo in the future by accident. The dummy, no-punctuation version is stored in a folder in order to differentiate its path from that of the punctuated version by more than just the punctuation. That way, if the punctuated version is found, it's clear it's not just because the punctuation is somehow being ignored.

Fixes getsentry/sentry-javascript#5998.
Fixes #319.

@lforst lforst closed this Dec 21, 2022
@lforst lforst reopened this Dec 21, 2022
@lforst
Copy link
Author

lforst commented Dec 21, 2022

@styfle New PR because @lobsterkatie is out over the holidays. You mind taking a look? It's the same changes but with the feedback on url.parse you wrote.

Btw, feel free to push any changes you want to make directly to this PR!

@styfle
Copy link
Member

styfle commented Dec 23, 2022

@lforst There are merge conflicts in the original PR as well as this PR

queryString = specifierUrl.search;

if (specifierUrl.search) {
path = specifier.replace(specifierUrl.search, '');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could replace the wrong part of the URL if it had repeating content in the string.

How about using the parsed result instead?

Suggested change
path = specifier.replace(specifierUrl.search, '');
path = specifierUrl.pathname;

Copy link
Author

@lforst lforst Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use pathname because we want to exclusively strip the query string off the specifier. If we use pathname we would also strip away suff like hash, protocol, host etc. (I know it is a bit edge-casey but still, let's do as little as possible to the specifier here)

I will rename the function to something like splitQueryStringFromSpecifier and change the return object signature to { queryString, remainingSpecifier } to make its purpose more clear.

9eeed58

Copy link
Author

@lforst lforst Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the repeating content, do you have examples of where this might go wrong?

Afaik, as per URL standard, the first ? will denote the start of the query/search part of the URL. search should always start with a ? and since .replace() will only replace the first occurrence of a match (unless something like the regex g flag is used) this should just work IMO.

Another reason not to use pathname is when used via new URL it strips off the relative url segments:

new URL("../../src/mymodule", "file://").pathname === "/src/mymodule" // instead of "../../mymodule"

Copy link
Member

@styfle styfle Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would also strip away stuff like hash, protocol, host etc.

Did you add tests for those cases? How does Node.js handle those cases?

since .replace() will only replace the first occurrence of a match

Ah yes, you're right.

new URL("../../src/mymodule", "file://").pathname === "/src/mymodule")

Oh good catch! CI is failing though so something is still wrong.

let queryString = null;

if (!cjsResolve) {
const specifierUrl = url.parse(specifier);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just learned url.parse() was deprecated (again) so lets use new URL() instead

Copy link
Author

@lforst lforst Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion on how to deal with URL not being able to handle relative paths? Would it be ok for us to use some random base url (eg http://example.com, file://, or placeholder-protocol://)?

Implementation based on best intentions: 44f3d3f

Copy link
Member

@styfle styfle Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a job for pathToFileURL() or maybe fileURLToPath()

@lforst lforst requested review from styfle and removed request for ijjk January 3, 2023 12:17
@lforst
Copy link
Author

lforst commented Feb 24, 2023

We don't need this anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolving files with querystring @sentry/nextjs 7.15 and 7.16 have issues with @vercel/nft
3 participants