Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: add hook for global preload code #32068

Closed
wants to merge 1 commit into from

Conversation

jkrems
Copy link
Contributor

@jkrems jkrems commented Mar 3, 2020

This work is meant to unblock progress on moving hook execution out of the main thread (and global scope). It creates an official place where hooks may interact with the global scope, hopefully allowing us to remove the need for random global side effects elsewhere.

So far this only applies the preload once before the initial root module is executed. It felt a bit too opinionated to also apply it to all vm-created contexts etc.. I assume that if somebody wants that for their hook, they could patch vm.createContext and friends to do it. This matches the power of the top-level code in today's loaders that also won't run for every newly created context.

  • Add test and/or document support in worker threads.
  • Remove unused context argument.
  • Change preload code to run in an implicit function scope instead using the last expression.

See: #31229

/cc @nodejs/modules-active-members

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@jkrems jkrems requested review from bmeck and guybedford March 3, 2020 17:16
@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. process Issues and PRs related to the process subsystem. labels Mar 3, 2020
@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

this seems very hacky. can you go more into detail about use cases and why we need to pass around stings of code that need to end with functions? As far as I can tell, one could, for example, just import VM and run code in the top level of their loader, and that would do the same thing as this, without node having to expose such an odd API.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

One fundamental design decision implied here is: there are multiple people who want to move loader hooks to run on a separate thread. See the linked PR for the progress on that front. In that context I’m not sure how your suggestion of using the vm module inside of the loader would work. It would be a vm module in a different isolate without clear access to the global scope that the app is running in.

There are two known alternatives for the current approach:

  1. Instead of providing a source string, provide a file name or URL. It gets a bit wonky with how that would interact with the require cache and/or module map.
  2. Make the function wrapper implicit, just like with CommonJS. It would remove the ability to declare certain kinds of globals but that may not be a deal breaker.

The use cases are instrumenting core modules and setting up any kind of global state required by code generated by other hooks. E.g. if transformSource assumes the existence on certain global runtime functions. Another use case is global attenuation / locking down the runtime before user code runs.

Right now loader hooks may just rely on the fact that they happen to run in the same global scope as all other code but when moving to threads, that is no longer true.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

could we just allow hooks to explicitly load a module? we couldn't return the namespace in the threaded design but it would let them evaluate code in the main context in a much more natural way.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

could we just allow hooks to explicitly load a module? we couldn't return the namespace in the threaded design but it would let them evaluate code in the main context in a much more natural way.

Let's say I have a loader that wants to remove globalThis.process (for whatever reason). How would that work? Would it add a noop resolve hook that then as a side effect loads a module to delete the property? If the hook thread is shared across isolates, would it keep track of which thread/context the resolve request is part of to not run it multiple times (and block resolution)? I don't think that's really natural for a lot of these cases.

The point here is to allow running code before any normal application code runs. And in many cases in a way that doesn't leak into what the application sees. So an import is very unlikely to work because it would also leak into the import map. Unless it's a magical import. Similar for a require because it shouldn't be the default that the author needs to remember to clean up the cache.

Yes, the code could be loaded from a file instead of from a string. But that wouldn't change the fact that it would have to be reliably triggered once before the user code runs.

Another alternative that was discussed at some point was to say that we'll expect any loader that requires access to the global scope to use two separate flags: --require preload.js --loader hooks.mjs. But that would be pretty unfortunate from a usability perspective.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

@jkrems with TLA its just await MainContext.import('delete-process') in the top of your loader (missed shared isolate case)

export async function init(main) {
  await main.import('delete-process');
}

So an import is very unlikely to work because it would also leak into the import map.

this shouldn't ever be explicitly observable. only export const firstImportedAt = Date.now() or things in that vein should be able to expose that.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

with TLA its just await MainContext.import('delete-process') in the top of your loader

Not sure what MainContext is and also not sure why it even has to be async and require TLA. And I assume it wouldn't actually import (as in: use the module map etc.) because then we'd be back to leaking information into user code.

So, your suggestion would be:

  1. Add a new API that can be imported only by loaders, e.g.import MainContext from '<magic loader string>';. This would work because loaders already run in their own isolated module map.
  2. Add a method to it that can be used to run an arbitrary file at an arbitrary point in time. Hopefully sync or we are blocked on TLA before we can actually ship loaders in any version of node.
  3. The file itself can't be a module because otherwise it would be forced into strict mode, making it a lot harder to write instrumentation code. Also, making it a module risks messing with bootstrap timing of the app.

So it would be something like this?

import MainContext from 'node-loader:main-context';
import { fileURLToPath } from 'url';

MainContext.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js')));

Where preload.js would contain the same content as above. Now for the next issue: If the hook thread is shared, the module body doesn't execute each time a new thread starts and would need to run the preload. How would that be handled?

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

@jkrems i edited my comment a bit ago to address the multi-isolate case

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

this shouldn't ever be explicitly observable.

If during normal execution import('<random string>') fails but it succeeds when the loader is active, it is observable. As soon as the loader or its preload is in the module map, it is observable.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

@jkrems i don't understand how you're observing that something was imported, esm is explicitly designed to preclude such a thing. (if our impl is leaking that's a side channel attack and we should fix it)

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

Okay, so your code becomes more fully:

export async function init(main) {
  await main.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js')));
}

This feels like a more complicated version of one of the alternatives I mentioned at the top:

export async function getGlobalPreloadPath() {
  return fileURLToPath(new URL(import.meta.url, './preload.js'));
}

Which in turn is a slightly more verbose version of just inlining the preload code.

i don't understand how you're observing that something was imported

Since it was provided by the loader and would otherwise - generally - not exist, the fact that I can import the URL at all is information.

import('file:///guessed/path/of/secret/agent/code.js').catch(err => {
  // run evil code because the agent won't catch us.
});

Also, when trying to mess with global state, there was an explicit request for having something that can run sloppy mode code. Which makes modules a problematic choice. The design goal was that having the preload has no visible impact on user code until very explicitly opting into it. This rules out leaking into the module map or even having a true require by default that may leak into the require cache.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

Since it was provided by the loader and would otherwise - generally - not exist, the fact that I can import the URL at all is information.

This is irrelevant, because the loader itself exists (i can just fs.stat for node_modules/loader). The thing you actually want to make sure of is that user code can't check whether a module has already been imported.

Sloppy code

import vm in your delete-process module?

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

I want to really explicitly address this "module map" case. The problem you seem to be having is not that the file would be in the module cache (since that can't be observed) but rather that the file can be imported at all. That would seem to motivate inlining the code in your loader as a string, which is what you suggest up at the top, which is what prompts me to say "ok i'll just check if your loader exists then". This is what further motivates me to say the entire case is irrelevant, because then a loader author would just not make "the loader exists" part of their security assumptions (indeed, user code that wants to exploit the globals being mutable doesn't gain anything from knowing the make-everything-immutable loader exists)

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

This is irrelevant, because the loader itself exists (i can just fs.stat for node_modules/loader).

No you can't because the preload code ran and can do all kinds of things, including hiding certain APIs or patching them to prevent access to certain directories (or replace them with virtual/in-memory locations only visible to the user code). We can play through the cat-and-mouse here but the point will always remain: There are cases where the fact that a certain key in the module map is "blocked" leaks and no amount of trickery can completely hide that fact.

I personally don't care about that leakage but it exists and afaik @bmeck and possibly others care.

import vm in your delete-process module?

Sure, that's possible. But it optimizes the API for... something. And I'm not sure what that something is. The current API is optimized for "patches and instrumentation that have minimal visible impact on what userland apps may see". Making it a module doesn't seem to offer any advantage, just making everything trickier to manage and harder to use..?

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

There are cases where the fact that a certain key in the module map is "blocked" leaks and no amount of trickery can completely hide that fact.

As i've said multiple times now, this is false. There is no way to check if something is in the module map.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

As i've said like multiple times, now this is false. There is no way to check if something is in the module map.

Right, I disagreed and gave a pretty explicit example for doing it. If you want something more complete:

// loader
fs.writeFileSync('/tmp/preload.mjs', PRELOAD);
MainContext.import('file:///tmp/preload.mjs');
fs.unlinkFileSync('/tmp/preload.mjs');

// how do I prevent the user code from checking that my preload code ran?
// how do I ensure that arbitrary code still works, even if it would want to load a different
// generated source text from /tmp/preload.mjs?

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

@jkrems your loader can just route the module trying to import /tmp/preload.mjs to ERR_MODULE_NOT_FOUND. Modules in the cache are keyed by (referrer, specifier), not resolved url.

In any case like i said, even if you delete preload.mjs, you can't delete your loader, and i can just check that the file exists, which leads to the conclusion i posted like an hour ago (which is don't base your security on the consumer not knowing you exist, trivially you can load up the inspector and iterate over every single script and module)

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

your loader can just route the module trying to import /tmp/preload.mjs to ERR_MODULE_NOT_FOUND.

Which would be... observable. That's the whole point. If the app writes its own /tmp/preload.mjs, it would break the app (cause an observable error or difference in behavior) if it doesn't get a new module with fresh evaluation and those new contents.

Keep in mind that this is about the ideal of preventing observable effects on arbitrary code. I agree that in likely or frequent cases, it wouldn't be observed in practice.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

@jkrems its ERR_MODULE_NOT_FOUND because you deleted it, if they created their own it would resolve to that.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

If you somehow hide the fact that your loader does actually exist (which is a really dumb thing to base your security on, please don't), you can do import('data:text/javascript,${MY_SOURCE}') if you really want.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

if they created their own it would resolve to that.

How? If there already is an entry in the module map for file:///tmp/preload.mjs due to the loader code running first, what would the behavior be? Afaik there's only three options:

  1. Prevent access by rerouting resolution (observable).
  2. Put it into the module map under a generated different URL (likely observable due to import.meta.url and/or relative resolution unless we make other changes).
  3. Grant access to the loader-injected module (observable).

This isn't just about security. It's also about breaking random code which is something an APM-case for example would care about.

@devsnek
Copy link
Member

devsnek commented Mar 3, 2020

How? If there already is an entry in the module map for file:///tmp/preload.mjs due to the loader code running first, what would the behavior be? Afaik there's only three options:

the cache is not keyed by resolved url, it is keyed by (referrer, specifier) pairs. i just said this like two messages ago.

@bmeck
Copy link
Member

bmeck commented Mar 3, 2020

I think the argument about loading vm.* isn't appealing since the intent of polyfills/--require mutations is to modify the main context.

I think the argument about wanting it to be a CJS or ESM module / anything on disk is likely not appealing as I cannot tell where the constraint that it needs to come off disk comes from. You can just use import() to get a hold of what you want/need and access disk if that is desirable.

I need to think on the function completion value and what happens if async work does occur. I did have serious problems getting timing to work for async behaviors with the bootstrap while trying to make a --import flag but would be curious if we can find some way to allow async behavior in this hook eventually.

I don't find the Module Map to be an issue as a loader can completely overwrite the behavior of loading stuff via import() and a simple prefix would fix things so that you know where via/when it was imported then.

Overall, I like this design as it enables a lot of things that are hard to manage while not being cluttered. The sloppy mode behavior is generally to enable the lexical contour of globals (welcome to hell) that some APIs do want to use and/or are specified as how standards define how things work. Some things also use it for various behavior that isn't present in strict mode such as globalThis polyfills.

As experimental, I think this API is fine and don't see any immediate changes needed as this is behind a flag and we can experiment with it in the wild some.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 3, 2020

the cache is not keyed by resolved url, it is keyed by (referrer, specifier) pairs.

This isn’t about the spec. It’s about the module map as it exists in node. It is keyed by URL unless there was a major change I missed recently. So as far as I know the cache is not keyed by these pairs. But happy to be proven wrong with links to nodejs source code showing otherwise.

@bmeck
Copy link
Member

bmeck commented Mar 3, 2020

There are 2 levels of module mappings (I often call them global (node, per context) and local (ecma262, per source text)) please be specific so I don't accidentally get confused T_T.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, looks good in principle.

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
doc/api/esm.md Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
@jkrems jkrems added the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Mar 9, 2020
@jkrems
Copy link
Contributor Author

jkrems commented Mar 9, 2020

Updated:

  • Uses a wrapper function instead of immediate global code.
  • Test includes running a worker thread.
  • The unused context argument was removed.

@jkrems jkrems removed the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Mar 11, 2020
Copy link
Member

@bmeck bmeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ready to me

@jkrems jkrems added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 16, 2020
@nodejs-github-bot
Copy link
Collaborator

@bmeck
Copy link
Member

bmeck commented Mar 17, 2020

CI failure appears unrelated.

If there are no objections I'd like to land this tomorrow and get back to #31229 which is pending this capability.

doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

bmeck pushed a commit that referenced this pull request Mar 23, 2020
PR-URL: #32068
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
@bmeck
Copy link
Member

bmeck commented Mar 23, 2020

Landed in 07a1fb9

@bmeck bmeck closed this Mar 23, 2020
MylesBorins pushed a commit that referenced this pull request Mar 24, 2020
PR-URL: #32068
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
@MylesBorins MylesBorins mentioned this pull request Mar 24, 2020
MylesBorins pushed a commit that referenced this pull request Mar 24, 2020
PR-URL: #32068
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Apr 3, 2020
PR-URL: nodejs#32068
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
MylesBorins pushed a commit that referenced this pull request Apr 3, 2020
Backport-PR-URL: #32610
PR-URL: #32068
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
@codebytere codebytere mentioned this pull request Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. esm Issues and PRs related to the ECMAScript Modules implementation. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants