Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: preImport #89

Closed
wants to merge 2 commits into from
Closed

proposal: preImport #89

wants to merge 2 commits into from

Conversation

guybedford
Copy link

This adds a preImport hook proposal.

The main text here corresponds to the readme for the PR, while including a more comprehensive end-to-end example.

Naming is still provisional, specifically the topLevel boolean context name and the preImport hook name are still being ironed out.

@guybedford guybedford added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Jun 19, 2022
doc/design/proposal-pre-import.md Outdated Show resolved Hide resolved
await generator.install(specifier);
// the new map will then have the new mappings and the old mappings
importMap = generator.getMap();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function doesn’t return anything, how are its results preserved for use in resolve later? As part of the internal state of generator or something it references?

If that’s the case, it might be good for the example to illustrate this. Maybe a variable could be declared at the top level of the loader that gets referenced by both hooks, and the contents of this variable are how the information determined by preImport gets shared with resolve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importMap mutable variable is the shared state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I missed this. I don’t know if it’s just me or if it’s likely to be nonobvious to others; if the latter, maybe add a comment inside resolve to the effect of // Reference same `importMap` variable modified within `preImport` . That would make the expected usage clear, so people know how this hook is expected to work.

@davidje13
Copy link

I don't have the background context on why resolve had to be made synchronous in the first place, but this proposal makes me wonder if an implementation detail is being leaked into userspace. Specifically, it seems that something like this would be perfectly valid:

const results = new Map();

export async function preImport(specifier, context) {
  const key = makeKey(specifier, context); // some deterministic mapping to (say) a string
  const result = /* some async operation here */;
  results.set(key, result);
}

export function resolve(specifier, context) {
  return results.get(makeKey(specifier, context));
}

At which point, the details of results and resolve could be hidden away and preImport just becomes the resolve of old?

I assume I must be missing something; for example is there some data which is only available at the (synchronous) resolve time, and not the earlier preImport? As a user of this API, I'd currently be very confused about what the intended (semantic) distinction is between the two methods.

});
await generator.install(specifier);
// the new map will then have the new mappings and the old mappings
importMap = generator.getMap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a race condition; old importMap is provided to potentially multiple Generators (as called out in the description: all run in parallel), then importMap is overridden with the results of each as they finish, so some results will get clobbered.

I'm not so concerned about this specific example, but more that this demonstrates how easy it is to make this sort of mistake with the API as proposed (parallel execution combined with forcing the output to be via global state is likely to cause many issues like this)


The `preImport` hook is called for each top-level import operation by the module
loader, both for the host-called imports (ie for the main entry) and for dynamic
`import()` imports. These are distinguished by the `dynamic` context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there was some conversation about the back-and-forth of dynamic vs topLevel. I don't have an opinion either way, but note that this is currently inconsistent with line 19.

@GeoffreyBooth
Copy link
Member

I don’t have the background context on why resolve had to be made synchronous in the first place

Because import.meta.resolve will be sync in browsers: whatwg/html#5572. Node’s version of import.meta.resolve should therefore be sync as well to avoid user confusion and code that works in one environment but breaks in the other; and import.meta.resolve calls the resolve hook (either Node’s internal one, or that internal one that’s been patched by a user loader). This change was made in nodejs/node#43363.

Comment on lines +24 to +25
The `preImport` hook allows for tracking and asynchronous setup work for every
top-level import operation. It has no return value, although it can return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “top-level” qualifier is worth calling out. What’s the precise definition? Any static or dynamic import in the application entry point file? Only dynamic imports in that entry point?

Is that whatever data gets created by preImport then available for all resolve operations, not just top-level specifiers?

Since this is something of an initialization-of-the-module-graph thing, not run for every import, would it make sense for this hook to just run once and be passed in an array of specifiers? Like preImport(specifiers: string[], context: object) where specifiers would be the list of all top-level import specifiers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition is any import which calls ECMA function 16.2.1.5.2 Evaluate.

It is called for all dynamic imports and the main entry point, and any module worker thread instantiation, and will be called with the exact specifier passed to those.

It's singular and not a list because the timing is unique to each specifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECMA function 16.2.1.5.2 Evaluate is incomprehensible to me.

Say an application consists of these two files. Which imports get preImport called?

// entry.js
import 'a';
await import('b');
import './c.js';

// c.js
import 'd';
await import('e');

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, it would get called for entry.js, then 'b' when that import is hit, and then 'e' when that dynamic import is hit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so perhaps the documentation should say something along the lines of this:

The preImport hook is called for the root of every tree of modules imported by a dynamic import: the initial entry point itself (the file passed to node on the command line) and any dynamic import() calls within. The way ES modules work is that an entry point is read and statically analyzed, and all the static import statements are themselves analyzed and followed and their files read and analyzed, and so on until all the leaves of the tree have been parsed. Then all this code is evaluated; and during this evaluation phase the dynamic import() statements are followed, which uncover new trees of modules and sub-modules. The same analysis and evaluation process then begins for each of them. preImport happens after the analysis but before the evaluation of each set of modules.

I would think that it could be called with all the specifiers within the tree of modules for which it’s being called? So in this example, the first call could be for entry.js, a, c.js and the second call would be for b, d and the third call would be for e.

Copy link

@davidje13 davidje13 Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, it would get called for entry.js, then 'b' when that import is hit, and then 'e' when that dynamic import is hit.

why wouldn't it get called for 'a', './c.js', and 'd'? Standard imports like that can still be async anyway, as I understand it? From @GeoffreyBooth 's comment, I understood that the only place where it would not be called is when a user invokes import.meta.resolve (which I think is a problem, but if this isn't even called when doing a regular import then I'm struggling to see the point)

@davidje13
Copy link

Because import.meta.resolve will be sync in browsers: whatwg/html#5572. Node’s version of import.meta.resolve should therefore be sync as well to avoid user confusion and code that works in one environment but breaks in the other; and import.meta.resolve calls the resolve hook (either Node’s internal one, or that internal one that’s been patched by a user loader). This change was made in nodejs/node#43363.

What's the intent for how import.meta.resolve interacts with preImport? I assume this means that import.meta.resolve will not invoke preImport? In that case, I'd say preImport isn't fit for purpose: we can't rely on it having been called for a corresponding resolve call, so if we need to load (e.g.) an import map as in the example to figure out how it should resolve, then that won't have happened. It also means it wouldn't be possible to use it for operations like resolving actual URLs over a network which I described in my 4th point here: #64 (comment).

This will lead to very confusing behaviour where import.meta.resolve will be inconsistent with the results of an actual import (and will usually be missed by authors since import.meta.resolve is not commonly used).

I wonder:

  • when "Move loaders off-thread" is implemented, won't that require async behaviour of resolve again? The only other options I can think of would be for resolve to never go off-thread (problematic for sharing state between hooks), or for the resolve call to be blocking like the old filesystem calls (and in that world, the resolve code itself can be as async as it likes in the other thread while the main thread blocks on it)

  • some of the use cases for async resolve code might be handled by running an operation once per file before any resolve calls, which is already available to usercode via the load hook, and perhaps just needs some added ability to share state between calls. I could imagine a world like this:

    export function resolve(specifier, context, defaultResolve) {
      const importMap = context.moduleContext?.importMap; // moduleContext is null for the main entry point
      return /* ... */;
    }
    
    export async function load(url, context, defaultLoad) {
      const importMap = await doThing(url);
      return {
        ...defaultLoad(url, context, defaultLoad),
        moduleContext: { importMap }, // this is now available to all import/resolve calls from the current module
      };
    }

    This would solve use-cases like loading config or import maps which depend on the location of the importing file (and do not yet need knowledge of the imported file). But note that it will not solve the use-case of things like importing files over a network (which might need to do operations like trying file extensions)

@cspotcode
Copy link
Contributor

I haven't had a chance to read this entire thread, so I might be repeating stuff, but here are my thoughts ahead of the meeting:


Thought experiment to validate proposals

Useful thought experiment for discussion:

  • imagine loader that allows mounting an async filesystem: could be networked, IPC, or not, but the point is it's async
  • imagine a loader that adds a new file extension to resolver algorithm
    • users really want to omit file extensions; I get asked about this a ton
  • imagine a loader that allows filesystem "overlay" of multiple virtual directories (perhaps for mocks?)
    • won't know which module to resolve to without attempting async stat of multiple candidates in the overlaid directories
    • Real-world use-case: this is what typescript does for outDir <-> rootDir & rootDirs: are essentially overlays
    • similar requirements to omitting file extensions, but I include this bullet point in case file extension omission is dismissed
  • imagine resolutions rely on reading package.json and tsconfig.json to correctly map paths
    • necessity of this is explained above

^-- Loaders for those use-cases need to compose with each other. (compose/chain/whatever)

Concern that preImport is just async resolve

worried preImport will essentially need to run the full resolve. The problem is:

a) need to async read stuff such as package.json or tsconfig.json
b) need to do the equivalent of async fs.stat to
c) the only way to know which of those things to read, is by stepping through the resolution logic (cannot pre-download all relevant files without doing resolution logic)

Given (c), we should be doing the resolution at the same time as the fetching. In which case, we should not / can not be repeating that work in resolve.

If preImport is gonna need to do resolve, then loader authors will want some sort of reliable way to connect a preImport invocation to the subsequent resolve invocation. (and the whole exercise will feel a bit silly and un-ergonomic) Pseudo code to illustrate the concept without prescribing an implementation:

export async function preImport(url, {reliableWayToLinkInvocations}) {
  resolutions.set(reliableWayToLinkInvocations, await doResolution(...))
}
export async function resolve(url, {reliableWayToLinkInvocations}) {
    return resolutions.get(reliableWayToLinkInvocations);
}

@GeoffreyBooth GeoffreyBooth removed the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Jul 1, 2022
@GeoffreyBooth
Copy link
Member

Should we keep this open? Per our last meeting this is the backup plan if we can’t get an appears-to-be-sync resolve working via threads.

@GeoffreyBooth
Copy link
Member

Closing as we’re going in a different direction per the last meeting, but happy to reopen if there’s some reason to keep working on this in parallel.

@ljharb ljharb deleted the preimport-proposal branch July 10, 2022 04:05
@cspotcode cspotcode mentioned this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants