Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow plugins to handle assets #2872

Open
surma opened this issue May 23, 2019 · 22 comments
Open

Allow plugins to handle assets #2872

surma opened this issue May 23, 2019 · 22 comments

Comments

@surma
Copy link

surma commented May 23, 2019

(Somewhat related to #2823).

Feature Use Case

With #2742 landed, I tried to set up Rollup to accept html files as entry points. HTML can contain references to JavaScript, which clearly should be handled using emitChunk.

CSS, however, is in a weird spot: It doesn’t map well to JavaScript so is more of an asset, but I want my css plugin to parse it to find further assets referenced within (e.g. using background-image: url(...)). Plugins are currently limited to handling JavaScript. Additionally, if I shoe-horn CSS to be handled as a chunk (which I successfully did in an experiment), Rollup will force the output file to have a .js file extensions.

Feature Proposal

My idea is (and feel free to discard this!) to give plugins a hook to handle newly emitted assets and analyze them for further dependencies (as proposed in #2823).

It’s worth noting that an SVGs are assets, which could contain references to JavaScript (same goes for straight HTML). So there is a chance that an asset has a dependency on a JavaScript chunk.

@tivac
Copy link
Contributor

tivac commented May 23, 2019

I built a lot of the functionality you're asking for as part of @modular-css/rollup, having rollup's chunking/dependency tracking applied to non-JS assets would be lovely because it would let me delete a bunch of the code I have for doing that myself.

@lukastaegert
Copy link
Member

Just some thoughts around this, I think we will need some iterations to get this right:

One option could be a transformAsset hook that works similar to the transform hook, receives the assets source and has the option to modify it (by returning a new source) or leave it untouched (by returning null)

This would solve the issue of emitting new chunks. The question is, though, how to reference the emitted chunks from CSS as at the time of emission, the final name is not known.

Another issue is how to handle assets that are emitted without a source. Technically, you can emit an asset without source during the "build" phase and then add the source during the "generate" phase. We could run the transformAsset hook as soon as the source is set, possibly tell the plugin at which stage we are at the moment and then leave it to the plugin to do what it must. Note that you cannot emit chunks during the generate phase.

Referencing chunks

Points to be considered:

  • Assets can contain references to chunks
  • The formatting of the reference can depend on the chunk (i.e. paths may need to be resolved relative to a different base or may be embedded in an entirely different format in a binary asset).
  • The hash of the asset should change if the name of the referenced chunk changes
  • The chunk name is only known at a very late stage when it has been hashed

Especially the last point means that we need yet ANOTHER hook to handle assets, e.g. renderAsset, to do the final transformations.

To handle the hashing, there are at least two approaches:

  • We could do it like rollup does it. Hash assets BEFORE rendering and add any dependencies to the hash (this would also mean that we definitely need to set the source before renderAsset). To have dependencies, we could track the use of emitChunk, i.e. if I emit a chunk during transformAsset, this chunk would become a dependency of the asset.
    • Advantage: Very stable, always works
    • Disadvantage: Hash does not reflect other modifications during renderAsset
  • Provided we again track dependencies as outlined above, we could also do a topological sort of the graph before renderAsset so that we KNOW the chunk names at that stage. This has been brought forward in another issue as a suggestion how to improve Rollup's hashing in general.
    • Advantage: "Perfect" hashes reflecting exactly the asset content
    • Disadvantage: Does not work if the asset is part of a dependency cycle (because sorting is impossible).

Possible solution: We can actually combine both approaches with the first serving as a fallback (with a warning) in case the second cannot be used. I.e.

  • Track asset dependencies via emitChunk (or again emitAsset) in the transformAsset hook (and possibly extend this to track chunk dependencies via emitAsset or emitChunk in the transform hook).
  • Offer a way to emit the same asset from several transform(Asset) hooks so that different assets and chunks can reference the same asset. Not sure what the best approach is yet. For emitted chunks, those are deduplicated via the id of the referenced module entry point. This does not easily extend to assets as we directly set the source here. I assume the safest would be to allow reemitting via the assetId that is returned by the first emitAsset call but this means this needs to happen within the same plugin and in a defined order.
  • Before rendering, we sort the chunk/asset dependency graph.
    • If it is sortable, we call renderAsset and renderChunk for each asset and chunk once all dependencies have been rendered and just hash the result to generate the file names.
    • If not, we hash the assets with their unmodified source, adding all known chunk and asset dependencies to the hash and also use the current hashing implementation for chunks. Of course, hashes should only be calculated on demand. IF hashes are part of some file names, a warning should be shown that due to circular dependencies, a fallback hashing algorithm is used that does not reflect changes in renderChunk/renderAsset (otherwise we stay silent and nobody will notice we switched algorithms).

Just some thoughts, suggestions are very much welcome. As this would be a HUGE refactoring, we could focus on implementing the fallback solution first and tackle the improved hashing later. Alternatively, we could first refactor the existing hashing and then add assets to the mix.

@tivac
Copy link
Contributor

tivac commented May 24, 2019

Real quickly jotting down notes on what I'm hoping to get out of a rollup asset-handling system:

  • Proper dependency tracking between chunks and assets. I mostly care about chunk -> assets and inter-asset dependencies, but tracking asset dependencies on chunks seems reasonable
  • Chunking assets based on the dependencies
  • More control over asset hashing. For example, I want to output foo.abc123.css, but don't want foo.abc123.css.map to be hashed because it needs the exact same name as the .css file without another hash added. Right now I add it directly to the bundles object which is very gross.

I also want to make sure that whatever we do here is compatible with systems like modular-css where JS files can reference a CSS file and get back JS from the plugin, but the CSS is still able to be tracked within the asset system and take advantage of all of goodness being proposed here.

I need to read @lukastaegert's proposal like 4 more times and jot down what I think the APIs would look like before I can provide more useful feedback, sorry :(

@lukastaegert
Copy link
Member

Thanks a lot for the initial feedback, as always very much welcome.

More control over asset hashing. For example, I want to output foo.abc123.css, but don't want foo.abc123.css.map to be hashed because it needs the exact same name as the .css file without another hash added. Right now I add it directly to the bundles object which is very gross.

I still think the "proper" solution would be to define an API to add arbitrary files to the bundle, not unlike what you are doing now but via a utility function that is under Rollup's control and that works from all hooks, not just generateBundle.

From my side, I plan on solving this one way or another before the next major release (so that with the next major, we could consider adding a warning to your technique, e.g. via a Proxy). There should also be an API to remove things from the bundle but here, we might also just support delete bundle.property.

@lukastaegert
Copy link
Member

Proper dependency tracking between chunks and assets. I mostly care about chunk -> assets and inter-asset dependencies, but tracking asset dependencies on chunks seems reasonable
Chunking assets based on the dependencies

There are several ways where we might give you the necessary information. One thing that comes to my mind is extending this.getModuleInfo. This already contains a list of imported module dependencies (importedIds). We could extend this with a list of referencedIds which would correspond to chunks emitted from the load or transform hook of this module, and a list of referencedAssets.

To query inter-asset dependencies, we could add a this.getAssetInfo which would also contain referencedIds and referencedAssets (for chunks and assets emitted from.

Then using the information you get in generateBundle, you could chunk your assets in a way that reflects the chunking of your modules.

Still, I keep wondering how we want to reference assets in all these APIs. Should we be using auto-generated ids or maybe allow the plugin developer to set an id, e.g. a file name? The latter would also improve readability of the output of getModule/AssetInfo.

@LongTengDao
Copy link
Contributor

What's the difference between this requirement and becoming webpack or gulp? I think this requirement is real, and not means another general bundle tool, but I can't make it clear while I'm thinking about this.

@shellscape
Copy link
Contributor

@lukastaegert what are your thoughts on continuing to track this discussion?

@lukastaegert
Copy link
Member

Definitely worth keeping, especially with the other issues rolled into this one.

@jacksteamdev
Copy link
Contributor

This would be great. Much of what has been discussed here I've accomplished by mutating the bundle in generateBundle, but I'd love to be able to avoid hacks like that.

@zouyaoji
Copy link

Maybe try this plugin? rollup-plugin-scss-smart-asset

@jakearchibald
Copy link
Contributor

In terms of handling assets with dependencies, should augmentChunkHash become augmentHash, and make it also called for assets?

Then, a CSS plugin could use augmentHash to add data about the various images/fonts it includes.

@lukastaegert
Copy link
Member

lukastaegert commented Feb 27, 2020

Actually, my plan is to make this go away and fix hashing "for good" because this "augmenting" is a difficult and error-prone thing. Here is my analysis and vision, which I hope to write up properly soon:

  • Hashing can only be easy and reliable if it looks at the final output
  • This is not possible if you allow arbitrary transformations in renderChunk
  • However for most cases, the transformations are not arbitrary: The output is still text and referenced filenames still appear in the output
  • Therefore I would rework hashing the following way:
    • before passing chunks to renderChunk, we add the proper filenames to the chunk content BUT wherever a [hash] would appear in the filename, we replace it with a marker such as __ROLLUP_HASH_42__
    • plugins can also query such a filename for an emitted chunk or asset, e.g. via this.getFileNamePlaceholder and insert it into an asset, transforming paths if needed. E.g. a plugin could get the file name placeholders both for an asset and another asset that is referenced, which would return something like assets/my-asset__ROLLUP_HASH_11__.css and assets/nested/my-other-asset__ROLLUP_HASH_12__.png, and the do something like path.relative to insert ./nested/my-other-asset__ROLLUP_HASH_12__.png into the referencing asset
    • after all renderChunk hooks, we iterate through all chunks and assets, calculating content hashes (that do not yet take dependency hashes into account) and collecting all __ROLLUP_HASH__ markers to build a "hash-dependency-graph" that also includes assets and dependencies between them
    • then we use the content hashes and the graph to calculate the actual hashes and adjust the file names and contents
    • some care needs to be taken to avoid name collisions

This would make sure we aways have correct hashes and also allow adding arbitrary additional dependencies e.g. in renderChunk.

@jakearchibald
Copy link
Contributor

Some thoughts around this:

What if a load hook could return an object like { source, name?, fileName? }. If so, the import is treated as an asset. It wouldn't become a chunk. It may be passed to something like transformAsset, as discussed above.

import url from 'foo';

If foo loads as an asset, url is the URL to the asset.

An HTML plugin could see:

<link rel="stylesheet" href="./styles.css">
<script type="module" src="./script.js"></script>

And call:

const styleRef = this.emitFile({
  id: './styles.css',
  importer: htmlId,
});
const scriptRef = this.emitFile({
  id: './script.js',
  importer: htmlId,
});

And replace ./styles.css and ./script.js in the HTML with the final URLs of styleRef and scriptRef (I'm hand-waving how that's done, as it might become more obvious once the hashing stuff is revamped).

The use of importer in emitFile is covered in #3436.

I've deliberately left type: 'chunk' out of the emitFile call, since it wouldn't quite make sense if ids could be either chunks or assets.

This system would also allow something like HTML to be an entry point, since it wouldn't have to produce a JS module at the end.

@lukastaegert
Copy link
Member

We should not recycle the load hook here IMO, this would be a rather breaking change for many plugins. Also it is completely unclear what it means to "import" a generic asset, i.e. what is actually imported in the JS file. The (relative/absolute) URL? The string content? Some loader code? I think this should remain in the domain of plugins that are responsible to take care of the interface between JavaScript and Assets.

And call:

const styleRef = this.emitFile({
  id: './styles.css',
  importer: htmlId,
});

what use is it to provide an importer if you actually know the importer and could just directly apply it?

There may be some use in decoupling asset loading/preprocessing and connecting it to JavaScript, though. At the moment the concept of an asset is rather opaque to rollup i.e. it is just a name and some data. Of course the intention here is that most assets are actually based on files, and files usually have a type in form of an extension.

So if we want to go there, I would suggest to have something like

this.emitFile({type: 'asset-file', id: '/path/to/my-file.css'});

and then have a new chain of (at least) a loadAsset (to support disk-less mode) and a transformAsset hook. Not sure if we need a resolver, we might definitely go without it first. The default loadAsset would in Node just pick up the file from disk.

@jakearchibald
Copy link
Contributor

jakearchibald commented Mar 11, 2020

We should not recycle the load hook here IMO, this would be a rather breaking change for many plugins.

Maybe it shouldn't be recycled, but I don't think it'd be a breaking change. The load hook's behaviour would only change if the hook returned { source, name?, fileName? }, which I assumed would error right now.

Also it is completely unclear what it means to "import" a generic asset, i.e. what is actually imported in the JS file. The (relative/absolute) URL? The string content? Some loader code?

I said url in the comment above, but there may be a better idea. But yes, if the default wasn't good enough, you'd write a plugin.

what use is it to provide an importer if you actually know the importer and could just directly apply it?

Say the HTML was:

<link rel="stylesheet" href="postcss:./styles.css">

The HTML plugin would be able to extract postcss:./styles.css from the tag, but only the postcss plugin would be able to resolve it correctly. And, the postcss plugin would need to know the importer in order to resolve it correctly.

So if we want to go there, I would suggest to have something like

this.emitFile({type: 'asset-file', id: '/path/to/my-file.css'});

and then have a new chain of (at least) a loadAsset

That could work. Although you'd need another mechanism to allow an entry point to be an asset, which I was trying to avoid.

@lukastaegert
Copy link
Member

My feeling is you are trying to put a lot of things into Rollup that plugins could handle well on their own, and as this is not Webpack I'm trying hard to keep Rollup's API as small and simple as possible.

If you want HTML as an entry point, it is just something you want for the user and you can easily have it today:

  • Add an options hook that detects the .html entry and removes it from the options
  • Add a buildStart hook that emits the necessary files and assets

Since we have emitFile now, there is no need to supply any entry point by the user as long as some plugin emits some in the buildStart hook.

Rollup would not know what to do if the load hook returned an asset and as I stated above, I'm unwilling to hardcode one of several solutions here, at least two of which (content or relative URL) seem equally reasonable to me. And for CSS, creating a style tag would be a third reasonable solution. Sure we could make a call here, but this can be easily handled in plugins today so there would be no advantage except some convenience (the usual handling is emitting a virtual module that exports whatever the plugin seems fit). To my knowledge, there is no "canonical" solution yet that is implemented in the majority of actual JS runtimes.

@lukastaegert
Copy link
Member

Maybe it would help more to go on a more abstract level and talk more about the problems and scenarios you want to solve before jumping to solutions.

@jakearchibald
Copy link
Contributor

  • Add an options hook that detects the .html entry and removes it from the options

Interesting! I hadn't thought of that.

this is not Webpack I'm trying hard to keep Rollup's API as small and simple as possible

I really appreciate that, and absolutely trust your directing of the project in terms of this.

Maybe it would help more to go on a more abstract level and talk more about the problems and scenarios you want to solve before jumping to solutions.

Imagine an HTML plugin that processes source like:

<!DOCTYPE html>
<link rel="stylesheet" href="postcss:./styles.css">
<p>Hello!</p>

For each external file reference in the HTML (postcss:./styles.css in this case), the HTML plugin wants to:

  1. Allow other plugins to handle the href and create a file as a result
  2. Replace the href value with a path to the resulting file.

If the HTML plugin wanted to inline the contents instead, that seems like something the HTML plugin would handle once it has access to the source of the resulting file.

@daKmoR
Copy link

daKmoR commented Jul 8, 2020

We have a the plugin @open-wc/rollup-plugin-html which can

  • Generate one or more HTML pages from a rollup build
  • Inject rollup bundle into an HTML page
  • Optionally use HTML as rollup input, bundling any module scripts inside
  • Optionally use multiple html files via a glob or html strings as input
  • Minify HTML and inline JS and CSS
  • Supports Inline JS modules
  • Suitable for single page and multi-page apps

I personally love the possibility to give it an array of html files (or strings) and it creates the best splitting of js and injects it where needed.

It does not (yet) handle advanced processing of css but it might at some point 🤞
open-wc/open-wc#1484

You can see more details here: https://www.npmjs.com/package/@open-wc/rollup-plugin-html

@frank-dspeed
Copy link
Contributor

my experiment #2839 (comment) solves the issues with late hasing based on content after all runs as it runs hashing as last step and the modifes the chunks to use the new hash

@dmail
Copy link
Contributor

dmail commented Oct 20, 2020

Maybe it would help more to go on a more abstract level and talk more about the problems and scenarios you want to solve before jumping to solutions.

I want to pass an html file to rollup and get an output where urls have been hashed.

Html file example

Html input

<!DOCTYPE html>
<html>
  <head>
    <title>Title</title>
    <meta charset="utf-8" />
    <link rel="icon" href="./favicon.ico" />
    <link rel="stylesheet" type="text/css" href="./main.css" />
  </head>

  <body>
    <script type="module" src="./main.js"></script>
  </body>
</html>

Html output

<!DOCTYPE html>
<html>
  <head>
    <title>Title</title>
    <meta charset="utf-8" />
    <link rel="icon" href="assets/favicon-5340s4789a.ico" />
    <link rel="stylesheet" type="text/css" href="assets/main-3b329ff0.css" />
  </head>

  <body>
    <script type="module" src="./main-f7379e10.js"></script>
  </body>
</html>

As mentioned by @surma the css files would be parsed to find stuff like @import "./file.css" or background: url('./image.png').

To make this happen I have followed the example from rollup documentation around emitFile that is suggesting to use buildStart and generateBundle hooks. I won't extend in details but I managed to create a rollup plugin that is parsing html, svg, css, and emit corresponding assets. However I won't publish it standalone because it's doing more and would be quite a lot of work to decouple.

All this to say that it's possible to write such plugin with the current rollup api with one limitation: You have to wait for generateBundle to be sure a js chunk is done and know the corresponding fileName to update it in the html file.

It means two things:
(1) You end up emitting most of your assets in generateBundle which feels strange to me. My mental model of rollup is that there is two phases and emitting asset belongs to the first one.

import { rollup } from 'rollup'

const bundle = await rollup({
  input: 'main.html',
})
// here I would expect bundle to contains assets
await bundle.generate({
  format: 'es',
  dir: 'dist'
})

(2) An asset that is not an entry point cannot reference js. It means you cannot import html containing js from in a js file because rollup would throw with

Error: Unable to set the source for asset "file.html", source already set.
Rollup throwing source already set code example
import { rollup } from "rollup"

let jsReferenceId
let htmlReferenceId

const bundle = await rollup({
  input: "main.js",
  plugins: [
    {
      resolveId: (id) => id,
      load(id) {
        if (id.endsWith(".html")) {
          jsReferenceId = this.emitFile({
            type: "chunk",
            id: "./file.js",
          })
          htmlReferenceId = this.emitFile({
            type: "asset",
            name: "file.html",
            source: `<!DOCTYPE html>
<html>
  <head>
    <script type="module" src="./file.js"></script>
  </head>
</html>`,
          })
          return `export default import.meta.ROLLUP_FILE_URL_${htmlReferenceId}`
        }
        if (id.includes("main.js")) {
          return `import htmlUrl from "file.html"
console.log(htmlUrl)`
        }
        return `export default 42`
      },
      generateBundle() {
        this.setAssetSource(
          htmlReferenceId,
          `<!DOCTYPE html>
<html>
  <head>
    <script type="module" src="${this.getFileName(jsReferenceId)}"></script>
  </head>
</html>`,
        )
      },
    },
  ],
})

bundle.write({
  format: "es",
  dir: "./dist",
})

If I remove the source when emitting file.html then rollup would throw with

Error: Plugin error - Unable to get file name for asset "file.html". Ensure that the source is set and that generate is called first.

@intrnl
Copy link
Contributor

intrnl commented Dec 14, 2020

If we allow plugins to do additional transformations to assets, surely it would make sense for assets to contain a map field now does it? Because right now, plugins emits sourcemaps by knowing the filename before hand and that would cause mismatch if the asset was transformed any further by another plugin.

This is sort of a post process step, but with the standardization of api object in plugins, I think it'd make sense to let plugins share asset references willingly from there, and then we could add a few methods to allow further modifications like bundling of multiple assets

function getAssetSource (ref: string): string | undefined
function deleteAsset (ref: string): void

Dealing with deleted assets is still something to be considered (does it just ignore any sort of reference in the source code, or?), but I'm mentioning them here anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests