Add hook for manifest loading #44

arcanis · 2021-10-13T16:32:27Z

During the September meeting I mentioned how we would benefit from having a way to tell Node how to load package.json files, as in our case they don't necessarily exist on the disk. I think it was reasonably well received, so I open this PR to try to see what would be the next step (cc @bmeck who raised some points around security).

GeoffreyBooth · 2021-10-20T22:05:08Z

I think in general the issue was that we needed to keep the total number of hooks minimal in order to make chaining work, hence the big PR to collapse what we had before to resolve/load/globalPreload. I think rather than creating more hooks to override things that happen within resolve or load, we can instead create lots of helper functions so that you can write your own resolve and pull in helpers to reuse Node code for all the logic other than the part you want to override.

arcanis · 2021-10-20T22:29:01Z

I think I'd need to see what this helper API would look like - I'm worried if every loader has to reimplement the whole resolve they'll quickly start conflicting (or the other way around, not integrating with each other), but perhaps with an example it'd be clearer.

cspotcode · 2021-10-20T23:17:06Z

It sounds like we're describing sub-hooks of resolve. And yeah, if node doesn't implement the sub-hooks, then they'll need to be implemented in user-space. So we'll need some sort of a community standard for sub-hooks, and a standard sub-hooking library that's responsible for composing multiple sub-hooks into a single resolve hook for node. I imagine that will get messy.

GeoffreyBooth · 2021-10-21T03:19:18Z

I’m not saying the decision has been made; it’s just that that’s my assumption of the direction we’re going based on the last big PR and on #26. I think there are arguments for the “lots of little hooks” approach too, but we would need a way to make it work with chaining. What would it mean to chain a theoretical resolvePackageMetadata hook when there’s also a chained resolve hook? Et cetera.

arcanis · 2021-11-10T09:49:56Z

Updated this PR to be mentioned in the chaining proposals, as per #48 (comment). Should be ready for review.

doc/design/overview.md

doc/design/proposal-chaining-iterative.md

JakobJingleheimer

I think a bit of preamble is needed describing the somewhat unique challenges this attempts to solve. Without, these seem like extra hooks.

doc/design/proposal-chaining-iterative.md

doc/design/proposal-chaining-middleware.md

doc/design/proposal-chaining-iterative.md

Co-authored-by: Derek Lewis <DerekNonGeneric@inf.is>

Co-authored-by: Jacob Smith <3012099+JakobJingleheimer@users.noreply.github.com>

doc/design/overview.md

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

JakobJingleheimer

Great, thank you!

@GeoffreyBooth should there be an example of each of these and how they might work together in a chain? (resolve and load have them)

arcanis · 2021-11-18T18:24:35Z

should there be an example of each of these and how they might work together in a chain? (resolve and load have them)

Speaking of that, I wonder if there's perhaps some redundancy in the way the chaining doc is written. Given that hooks all share the same "pattern" for a given proposal, whether it's for middleware or iterative approach, shouldn't the design overview list what are the general input/output of each hook (without taking the chaining into account), and the chaining docs describe how hooks are composed on a generic level?

In its current state, it feels like there aren't many significant differences between the resolve and load section of the chaining proposals, except for their input/output (which are roughly the same in both proposal).

JakobJingleheimer · 2021-11-18T20:32:32Z

I think the reason we did that for resolve and load is because their innards are substantially different due to the middleware vs iterative design. If that would be more of the same for the fs hooks, then maybe not. I think one of the original reasons to have examples (back when there was just the 1 proposal from which the current middleware-style one is derived) was to show how one loader in the chain interacts with the next (no pun intended).

For instance, what if package foo was remote and its remote is a zip and subsequently that points elsewhere also remote—I don't know if that's actually possible, but if it is, it might make a good example to demonstrate the use of the hook and the need for it to be a hook rather than just a utility.

GeoffreyBooth · 2021-11-18T22:25:43Z

The chaining doc was written to just have two full examples, of a chain of resolve hooks and a chain of load hooks, so that there was enough detail that everyone could understand it. Arguably we should have yet another large example, of chaining loaders where it’s not just a loader with one type of hook but a loader with multiple hooks (like a TypeScript loader that might define both a resolve hook, to say how to resolve imports of with TypeScript-specific things like path mappings, and a load hook that transpiles those files’ sources into JavaScript).

One thing we do have to spec out is how hooks connect to each other. I was recently working on import assertions, which currently happen inside the load hook. This means you can’t define a custom assertion validation behavior without overriding/defining an entire custom load hook. We can’t really chain things that happen inside hooks; like if a load hook calls fs.readFile, it can’t call the a chain of readFile hooks because then you have code from all registered loaders executing while you’re still in the load hook of the first loader. However we can add additional hooks elsewhere in the pipeline; like resolve returns a url that’s the input of load, which could return source and assertions that are the input of a new hook validate, and then this new validate does the assertion validation and returns source and format. Because validate is fully after load in the pipeline, it’s a new hook that we could add without disrupting the ability of load to be chained. Or put another way, resolve runs all registered resolve hooks, passing the output of the last one into load which runs all registered load hooks, which passes its final output into validate which runs all registered validate hooks. Because they happen in sequence, each of the hooks in this pipeline is chainable.

This is one way to add a new hook without breaking the ability for hooks to be chainable. Being completely separate from this pipeline, the way globalPreload is, is another way. That’s the challenge for new hooks, is defining how they can be chainable without breaking the ability of the current resolving/loading pipeline to be chainable.

arcanis · 2021-11-18T22:37:00Z

if a load hook calls fs.readFile, it can’t call the a chain of readFile hooks because then you have code from all registered loaders executing while you’re still in the load hook of the first loader

Can you detail why that would be a bad thing? In my mind, hooks like readFile would be called by the Node helpers. Given that you've suggested a few times to leave it up to the loaders' implementations to call those helpers, my understanding is that hooks can necessarily call each other (indirectly, through an abstracted interface). What would be the problem with that?

GeoffreyBooth · 2021-11-18T22:49:25Z

Can you detail why that would be a bad thing?

Maybe it wouldn’t be, but it would certainly complicate users’ ability to order their loaders. Say you have loaders A, B, and C, where A is first. When A’s load hook starts running and it calls readFile, the registered readFile hooks for all of A, B, and C all run. Then when B’s load hook starts running and it calls readFile, the registered readFile hooks for all of A, B, and C all run all over again.

arcanis · 2021-12-03T16:54:47Z

I opened an implementation draft on the Node repository: nodejs/node#41076

GeoffreyBooth · 2022-01-04T01:21:26Z

@arcanis I finally reviewed nodejs/node#41076. Sorry for delay, and I look forward to discussing it tomorrow. A few general notes:

The index-sha512.mjs example in the PR feels like something that should be achievable with the current hooks. resolve could find the sha512 suffix, and load could generate custom source in response. Is there something lacking in the current loaders API that prevents this example from being achieved?
Along those lines, I think we need an example/use case that isn’t achievable with the current loaders API. Especially a common/core use case like “instrumentation” or “mocking” that is a general need of the community.
I think are some current prominent projects that monkey-patch CommonJS fs; do you mind listing them and what they do, and why the monkey-patching is necessary? Like is Yarn Plug-and-Play one of these, for example? An instrumentation package? Having this written down somewhere, ideally as a file in this repo, would be a great resource for considering use cases that we need to support.

As I wrote in nodejs/node#41076 (comment), I think the next step would be a PR against this repo with some more Markdown files: some background about monkey-patching fs (if you don’t mind), and a design doc for filesystem hooks that can be its own file or part of https://github.com/nodejs/loaders/blob/main/doc/design/overview.md. nodejs/node#41076 proves that an implementation is possible, not that I think anyone would have doubted its achievability; so now we need to work out exactly what the API should be and how it fits together.

arcanis · 2022-01-04T09:00:23Z

The index-sha512.mjs example in the PR feels like something that should be achievable with the current hooks. resolve could find the sha512 suffix, and load could generate custom source in response. Is there something lacking in the current loaders API that prevents this example from being achieved?

Can you write one such loader, that we have a baseline for comparison? As far as I know, the resolve return value must be an existing file, which isn't the case here. As a result the stats calls will crash, making this solution non-viable.

I think are some current prominent projects that monkey-patch CommonJS fs; do you mind listing them and what they do, and why the monkey-patching is necessary? Like is Yarn Plug-and-Play one of these, for example? An instrumentation package? Having this written down somewhere, ideally as a file in this repo, would be a great resource for considering use cases that we need to support.

Isn't it documented in this very PR? There aren't that many other examples due to the lack of simple primitives (Electron & PnP are the main ones I have in mind, because our projects are amongst the rare to have the bandwidth to maintain our own virtual fs implementations), but those capabilities are foundational in both cases.

aduh95 · 2022-01-04T10:31:06Z

This should work:

export function resolve(specifier, context, next) {
  if(specifier.endsWith('?sha512') || specifier.endsWith('-sha512.mjs')) {
    const hash = calculateHash(specifier);
    return { url:`data:text/javascript,export%20default${encodeURI(JSON.stringify(hash)}`, format:'module' };
  }
  return next(specifier, context);
}

Maybe a loader that supports resolving inside a .tar archive would be a better example?

arcanis · 2022-01-10T22:29:03Z

@aduh95 as far as I can tell this loader isn't sufficient; since the result is a data-url, Node won't treat it the same as regular files. For instance, if you have an exports field pointing to it, Node will crash:

import hash from 'pkg/hash';

{
  "name": "pkg",
  "exports": {
    "./hash": "./index-sha512.mjs"
  }
}

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/tmp/index-sha512.mjs' imported from /tmp/index.mjs

Basically, the use case is that from Node's perspective, nothing should separate the virtual files from true files, they should have the exact same semantic, and go through the exact same code path. Failing that, they'll be guaranteed to have diverging resolution behaviors and edge cases.

aduh95 · 2022-01-10T23:48:47Z

I would like for this to work:

export function resolve(specifier, context, defaultResolve) {
  const nextResult = defaultResolve(specifier, context);
  if (nextResult.url.endsWith('?sha512') || nextResult.url.endsWith('-sha512.mjs')) {
    const hash = calculateHash(nextResult.url);
    return {
      url: `data:text/javascript,export%20default${encodeURI(JSON.stringify(hash))}`,
      format: 'module',
    };
  }

  return nextResult;
}

But unfortunately defaultResolve throws if the file doesn't exist. It feels weird to me that this would fail at resolve rather than load; since there is no extension searching in ESM loader, checking if the file exists at this step seems like unnecessary work 🤔 Anyway, that's really not what this thread is about.

since the result is a data-url, Node won't treat it the same as regular files.

Not sure how you mean, it's still a module with a default export that contains the information you seek (Node.js should treat data-url modules same as regular files imo).

Basically, the use case is that from Node's perspective, nothing should separate the virtual files from true files, they should have the exact same semantic, and go through the exact same code path. Failing that, they'll be guaranteed to have diverging resolution behaviors and edge cases.

I believe that, I'm still not convinced a hashing loader is the best example for the use case though.

GeoffreyBooth · 2022-01-10T23:49:04Z

@aduh95 as far as I can tell this loader isn’t sufficient; since the result is a data-url, Node won’t treat it the same as regular files. For instance, if you have an exports field pointing to it, Node will crash:

The https loader example is one case where a resolved URL isn’t a real file, and the load hook supplies its contents: https://github.com/nodejs/loaders/blob/main/doc/design/proposal-chaining-middleware.md#https-loader. If that works for regular imports but not for "exports", then the issue is just that we aren’t sending the "exports" paths through the full loaders code path that import specifiers get (though maybe we shouldn’t?). But it should already work that resolve doesn’t need to return a valid file URL, as long as an accompanying load hook handles that specifier and supplies some source for it.

arcanis · 2022-01-11T00:32:48Z

it's still a module with a default export that contains the information you seek (Node.js should treat data-url modules same as regular files imo).

But it's not: turning a path into a data url is a lossy process, since you go from location + data to just data. Here's another easy way to break it: imagine we return a virtual file that contains import statements¹. Where should they be resolved from? Where should Node look for a package.json, to check if there's any exports field covering these imports²? If we deal with a virtual file path, it's easy, there's nothing special: we use our virtual file path as importer. But data urls don't have physical location on disk.

Perhaps it could be workaround by encoding special types of URL (so that instead of returning a data url with the file content, we instead return a data url with a special payload containing the missing information), but it'd be very fragile and prone to break.

Which is what happens in Yarn with our zip layer: we return the packages' source files, so they contain everything a package contains: package.json files, directories, relative imports, bare identifier imports, etc. ↩
Remember that in Yarn's case, this package.json would itself be virtual, inside a virtual directory. Hence why my PR implements hooks for the stat and readJson calls: to allow Node to traverse this virtual hierarchy. ↩

JakobJingleheimer · 2022-01-11T18:40:27Z

As far as I know, the resolve return value must be an existing file, which isn't the case here.

@arcanis that is not correct 🙂 See test-esm-loader.mjs → virtual file (which uses this loader).

GeoffreyBooth changed the title ~~Adds a loader for manifest loading~~ Add hook for manifest loading Oct 20, 2021

Adds a loader for manifest loading

bdc8aff

arcanis force-pushed the patch-1 branch from 0269243 to bdc8aff Compare November 10, 2021 09:48

arcanis marked this pull request as ready for review November 10, 2021 09:49

Renames into readFile / statFile

d97a950

arcanis added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Nov 12, 2021

Merge branch 'main' into patch-1

8c1a4ea

DerekNonGeneric reviewed Nov 12, 2021

View reviewed changes

doc/design/overview.md Outdated Show resolved Hide resolved

JakobJingleheimer reviewed Nov 17, 2021

View reviewed changes

doc/design/proposal-chaining-iterative.md Outdated Show resolved Hide resolved

JakobJingleheimer reviewed Nov 17, 2021

View reviewed changes

doc/design/proposal-chaining-iterative.md Outdated Show resolved Hide resolved

doc/design/proposal-chaining-middleware.md Outdated Show resolved Hide resolved

doc/design/proposal-chaining-iterative.md Outdated Show resolved Hide resolved

arcanis and others added 5 commits November 18, 2021 10:46

Update doc/design/overview.md

9ddc249

Co-authored-by: Derek Lewis <DerekNonGeneric@inf.is>

Adds rational to the overview document

cb72fc4

Apply suggestions from code review

fa0c8cf

Co-authored-by: Jacob Smith <3012099+JakobJingleheimer@users.noreply.github.com>

Update proposal-chaining-iterative.md

2c9ffb6

Update proposal-chaining-middleware.md

b616257

aduh95 reviewed Nov 18, 2021

View reviewed changes

doc/design/overview.md Outdated Show resolved Hide resolved

Update doc/design/overview.md

3eb3369

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

JakobJingleheimer reviewed Nov 18, 2021

View reviewed changes

mhdawson mentioned this pull request Nov 19, 2021

Node.js Loaders Team Meeting 2021-11-23 #53

Closed

mhdawson mentioned this pull request Dec 3, 2021

Node.js Loaders Team Meeting 2021-12-07 #54

Closed

arcanis mentioned this pull request Dec 3, 2021

esm: implement the getFileSystem hook nodejs/node#41076

Draft

GeoffreyBooth removed the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Dec 7, 2021

GeoffreyBooth mentioned this pull request Sep 6, 2022

module: open stat/readPackage to mutations nodejs/node#44537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hook for manifest loading #44

Add hook for manifest loading #44

arcanis commented Oct 13, 2021

GeoffreyBooth commented Oct 20, 2021

arcanis commented Oct 20, 2021 •

edited

cspotcode commented Oct 20, 2021

GeoffreyBooth commented Oct 21, 2021

arcanis commented Nov 10, 2021

JakobJingleheimer left a comment

JakobJingleheimer left a comment

arcanis commented Nov 18, 2021

JakobJingleheimer commented Nov 18, 2021

GeoffreyBooth commented Nov 18, 2021

arcanis commented Nov 18, 2021

GeoffreyBooth commented Nov 18, 2021

arcanis commented Dec 3, 2021

GeoffreyBooth commented Jan 4, 2022

arcanis commented Jan 4, 2022

aduh95 commented Jan 4, 2022

arcanis commented Jan 10, 2022

aduh95 commented Jan 10, 2022

GeoffreyBooth commented Jan 10, 2022

arcanis commented Jan 11, 2022

JakobJingleheimer commented Jan 11, 2022

Add hook for manifest loading #44

Are you sure you want to change the base?

Add hook for manifest loading #44

Conversation

arcanis commented Oct 13, 2021

GeoffreyBooth commented Oct 20, 2021

arcanis commented Oct 20, 2021 • edited

cspotcode commented Oct 20, 2021

GeoffreyBooth commented Oct 21, 2021

arcanis commented Nov 10, 2021

JakobJingleheimer left a comment

Choose a reason for hiding this comment

JakobJingleheimer left a comment

Choose a reason for hiding this comment

arcanis commented Nov 18, 2021

JakobJingleheimer commented Nov 18, 2021

GeoffreyBooth commented Nov 18, 2021

arcanis commented Nov 18, 2021

GeoffreyBooth commented Nov 18, 2021

arcanis commented Dec 3, 2021

GeoffreyBooth commented Jan 4, 2022

arcanis commented Jan 4, 2022

aduh95 commented Jan 4, 2022

arcanis commented Jan 10, 2022

aduh95 commented Jan 10, 2022

GeoffreyBooth commented Jan 10, 2022

arcanis commented Jan 11, 2022

Footnotes

JakobJingleheimer commented Jan 11, 2022

arcanis commented Oct 20, 2021 •

edited