Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webpack plugin does not respect splitChunks #1859

Closed
OliverJAsh opened this issue Jan 24, 2019 · 32 comments
Closed

webpack plugin does not respect splitChunks #1859

OliverJAsh opened this issue Jan 24, 2019 · 32 comments

Comments

@OliverJAsh
Copy link

OliverJAsh commented Jan 24, 2019

Library Affected:
workbox-webpack-plugin

Issue or Feature Request Description:
I have enabled webpack's splitChunks option. This splits my chunks so that "vendor" imports are stored in a separate asset. For example, the main chunk will result in two assets: main.js and vendors~main.js.

When I use InjectManifest's chunks option to whitelist the main chunk, I expect it to include all the assets relating to this chunk. In this case, that would be main.js and vendors~main.js.

However, it does not seem to respect the vendor asset created by splitChunks—it only includes main.js.

It is not feasible for me to list these manually, because webpack may create any number of vendor assets, depending on the common module imports between all my chunks.

@jeffposnick
Copy link
Contributor

Apologies that you're running into this. CC: @developit for visibility, as he's looking into a revamp of the workbox-webpack-plugin.

To aid in reproducing this on our end, could you share either a link to your project's config (if it's publicly available) or alternatively share a snippet of your relevant workbox-webpack-plugin config + the webpack config related to your main chunk?

@OliverJAsh
Copy link
Author

@jeffposnick Here is a minimal test case: https://github.com/OliverJAsh/workbox-webpack-split-chunks-example/tree/055c750a757ce1fc3e1a436f88f59e468a67e791

@OliverJAsh
Copy link
Author

Do you have any thoughts on whether this is a bug workbox needs to fix, or something else? We're keen to use this for the Unsplash PWA but a bit stuck on this at the moment

@jeffposnick
Copy link
Contributor

Thanks for the reproduction—I'm investigating now to see if there are any workarounds using the current workbox-webpack-plugin codebase.

@jeffposnick
Copy link
Contributor

jeffposnick commented Jan 28, 2019

The chunks option in workbox-webpack-plugin does an exact match on chunkName values for the purposes of whitelisting.

We read those chunkName values here:

I'm not sure if there's a better way of getting those chunkName values out of webpack's asset pipeline, but that seemed like a straightforward approach. (The work that @developit is doing on the plugin rewrite might factor in here.)

In your reproduction, the main.js asset has a chunkName value of 'main', and the vendors~main.js asset has a chunkName value of 'vendors~main'. Since 'main' isn't an exact match for 'vendors~main', your split chunk isn't included in the manifest.

You have a couple of options right now:

  • Stop using chunks for whitelisting, and just include everything, or use something like include that offers filename-based pattern matching. (I'm assuming you don't want to do this, as your production config is likely to be complicated this would make it harder to deal with.)
  • Keep using chunks but include the relevant prefix(es) along with all of your other explicit chunkName values.

Code to automate the second option could look like:

new InjectManifest({
  // ...other config...
  chunks: ['main'].reduce((acc, c) => acc.concat(c, `vendors~${c}`), []);
  // Or use flatMap() if your node runtime supports it...
})

You might have already considered the second option, based on this:

It is not feasible for me to list these manually, because webpack may create any number of vendor assets, depending on the common module imports between all my chunks.

From your simple reproduction, I only see 'vendor~' prepended to your single chunk name, and that is straightforward enough to just automatically include as a variation when you create your list of chunks to whitelist. In your actual production build, how complicated do the chunk names get?

@OliverJAsh
Copy link
Author

I will read your answer in detail tomorrow, but to answer this question quickly:

In your actual production build, how complicated do the chunk names get?

One example: vendors~collections-route~editorial-route~following-route~keyword-landing-page-route~photos-route~se~d1c1c472.cf514.js. 😒

@jeffposnick
Copy link
Contributor

Thanks for that production context. It sounds like constructing chunk names on the fly is not going to be a viable option for you.

Reading up on SplitChunksPlugin a bit, and playing around with some debugging inside of our plugin, I believe it is possible to obtain info about where the original chunk originated from via the internal information in compilation.chunks[]._groups.runtimeChunk.name. The fact that that involves accessing an internal _groups variable, and the fact that I'm also not sure whether runtimeChunk is the right property to look at definitely gives me pause, and I would want to do a lot more investigation into other sources of getting that chunk group info before assuming that was the right approach.

So, I think that the current status is that it would be difficult to use the chunks configuration option in the current workbox-webpack-plugin plugin to give you the behavior you want.

The most promising approach might end up being using the include option, which gives more flexibility by doing RegExp matching against the generated filenames.

@OliverJAsh
Copy link
Author

If it helps, we have a similar problem with server-side rendering: we need to know all the chunks that were referenced in a server-side render, so that we can include the necessary script tags needed for client-side rendering.

I believe the project webpack-flush-chunks essentially provides us with a function that converts main into ['main', 'vendors~main'], so it might be worth looking into how that works. Although I find that project's documentation extremely difficult to parse—at the time we had no option but to use it…

There are projects of a similar nature which I believe also have to solve this problem, e.g. https://github.com/smooth-code/loadable-components/blob/69dbb8b2b1f32da9bcac7b91515b81f7339f68a4/packages/server/src/ChunkExtractor.js#L138.

@philipwalton
Copy link
Member

@OliverJAsh, if it's helpful, I use splitChunks on my blog to do exactly what you describe (version all npm pacakges separately), and the way I get access to all the output filenames is via webpack-manifest-plugin.

Once you have a manifest of output files, it's relatively straightforward to write a task to run either workbox-build's injectManifest() or your own bundling task. In my case, I use Rollup to bundle my service worker, and I use rollup-plugin-replace (which is just like webpack's DefinePlugin) to get my revisioned asset manifest into my service worker.

If you want to try a custom approach like that, I'd be happy to help until we can get these issues resolved in the next version of the webpack plugin.

@OliverJAsh
Copy link
Author

OliverJAsh commented Jan 28, 2019

How would you use that approach to selectively whitelist chunks whilst also including any split chunks which relate to the whitelisted chunks? For example, if I had two chunks, foo and bar, and I only whitelisted foo, I would also want to include the split chunks for foo, e.g. vendors~foo.

@philipwalton
Copy link
Member

I'd probably just use a regular expression (or a filter function) that iterates through each of the assets and only returns the ones I wanted to precache.

When you use the cacheGroups property in splitChunks, you have control over how your assets are named, so you could name them in such a way that it'd be easy to filter out the assets you don't want to precache later.

Am I understand your question correctly? Here's an example of how I use cacheGroups.

cacheGroups: {
  npm: {
    test: /node_modules/,
    name: (mod) => {
      const pkgName = mod.context.match(/node_modules\/([^/]+)/)[1];
      return `npm.${pkgName}`;
    },
  },
},

It produces the following assets for me:

main-fc7013dae9.mjs
runtime-b3e4216ae2.mjs
npm.autotrack-b3eeae9a0c.mjs
npm.dom-utils-84accc4313.mjs
npm.idlize-17b460b9d8.mjs
npm.workbox-window-c6c7198e0f.mjs

@OliverJAsh
Copy link
Author

Thanks @philipwalton. I'll give that a try.

Right now I'm thinking of something like the following:

const whitelistedChunkNames = ['main', 'vendors'];

const assetsToPrecache = assets.filter(asset => {
  const chunkNames = asset.split('~');
  return chunkNames.some(chunkName => whitelistedChunkNames.includes(chunkName))
});

Is this something the chunks option will support out of the box in the future? That would seem like the intuitive behaviour. If so, I guess the question is how to extract this information from the webpack stats data?

@OliverJAsh
Copy link
Author

OliverJAsh commented Jan 30, 2019

@jeffposnick Could we extract this information from stats.entrypoints or stats.namedChunkGroups?

{
  "entrypoints": {
    "main": {
      "chunks": [
        1,
        0
      ],
      "assets": [
        "vendors~main.js",
        "main.js"
      ],
      "children": {},
      "childAssets": {}
    }
  },
  "namedChunkGroups": {
    "main": {
      "chunks": [
        1,
        0
      ],
      "assets": [
        "vendors~main.js",
        "main.js"
      ],
      "children": {},
      "childAssets": {}
    }
  }
}

I believe this is how other tools do it:

@jeffposnick
Copy link
Contributor

Thanks for all those references. I definitely think we should follow their lead in a future revision of the plugin to support this without manual configuration as much as possible.

In the meantime, have you had any luck using workbox-webpack-plugin's include rather than chunks, and matching against the filenames you want whitelisted via RegExps?

@OliverJAsh
Copy link
Author

@jeffposnick I'm experimenting with a filter over self.__precacheManifest, as that gives me a bit more control then using include + regex.

const whitelistedChunkNames = ['main'];

const filteredPrecacheManifest = self.__precacheManifest.filter(({ revision, url: pathname }) => {
  const pathnameParts = pathname.split('/');
  const assetFileName = pathnameParts[pathnameParts.length - 1];
  const chunkNames = assetFileName
    // Remove `chunkhash` + file extension
    .replace(/\..+\.(js|css)$/, '')
    // Chunk names are delimited by `~` character
    .split('~');

  console.log({ pathname, chunkNames });

  const doesRelateToWhitelistedChunk = chunkNames.some(chunkName =>
    whitelistedChunkNames.includes(chunkName),
  );

  return doesRelateToWhitelistedChunk;
});

console.log({ filteredPrecacheManifest });

One caveat I'm just thought of however is that I'm not sure if webpack truncates the file names. In which case, there would be no guarantee the chunk name would appear in the asset file name.

@OliverJAsh
Copy link
Author

One caveat I'm just thought of however is that I'm not sure if webpack truncates the file names. In which case, there would be no guarantee the chunk name would appear in the asset file name.

IIUC, the same caveat would apply to the includes approach you suggested?

@philipwalton
Copy link
Member

I don't believe it does, but you could run it in production mode to confirm. If they change in the future that'd be a breaking change.

@OliverJAsh
Copy link
Author

The example I posted earlier appears like it might be truncated. Notice the se near the end: vendors~collections-route~editorial-route~following-route~keyword-landing-page-route~photos-route~se~d1c1c472.cf514.js.

@philipwalton
Copy link
Member

Wow, that's quite the filename! :)

@OliverJAsh
Copy link
Author

Unfortunately this means we're now blocked by this issue. If there's any more information I can provide, I would be happy to help.

@jeffposnick
Copy link
Contributor

So, it's unlikely that we're going to come up with a formal solution to this prior to some of the work that @developit has planned for refactoring the workbox-webpack-plugin. There's unfortunately a number of similar issues that need to be addressed—see #1591

One path forward right now could be to forgo using the workbox-webpack-plugin entirely, and instead, run workbox-cli (or workbox-build via a custom node script, if you'd prefer) after your webpack build completes. You can read in compilation-stats.json and do whatever you want with that data if you take that approach.

I just tried modifying your earlier reproduction project by putting in a two-step build:

  "scripts": {
    "build": "webpack --json > /tmp/compilation-stats.json && workbox injectManifest"
  }

and then created a workbox-config.js file that contains:

const compilationStats = require('/tmp/compilation-stats.json');

const chunksToPrecache = ['main'];
const globPatterns = [];
for (const chunkName of chunksToPrecache) {
  globPatterns.push(...compilationStats.namedChunkGroups[chunkName].assets);
}

module.exports = {
  swSrc: './src/service-worker.js',
  swDest: './webpack-target/service-worker.js',
  globDirectory: 'webpack-target',
  globPatterns,
};

The difference here is that you're just relying on matching against files located on disk following the successful webpack build, instead of tapping into the webpack asset pipeline.

You'd want to your ./src/service-worker.js file to look something like:

importScripts('https://storage.googleapis.com/workbox-cdn/releases/4.0.0-beta.2/workbox-sw.js');

// The [] will be replaced by your precache manifest.
workbox.precaching.precacheAndRoute([]);

// Add in whatever runtime caching config, etc. you want.

@OliverJAsh
Copy link
Author

@jeffposnick Thanks for your continued support on this!

The difference here is that you're just relying on matching against files located on disk following the successful webpack build, instead of tapping into the webpack asset pipeline.

Won't this suffer from the same problem with regards to truncated file names? The globs might not find a match if the chunk name has been truncated from the generated asset's file name.

@jeffposnick
Copy link
Contributor

I was under the impression that the values in the compilationStats.namedChunkGroups[chunkName].assets array would be the actual filenames, on disk, that were output for a given chunkName. (As per your earlier comment, #1859 (comment).)

The general idea is that you can read in the full compilation stats after-the-fact and look at whatever properties make the most sense for your use case. If it's not that particular property, then presumably the filenames you're after are stored somewhere else in compilation stats.

Despite the globPatterns name for that configuration option, you'd be passing in full filenames without any wildcards in this approach.

@OliverJAsh
Copy link
Author

@jeffposnick That does make sense, thanks! I'll give it a shot.

@jeffposnick
Copy link
Contributor

Great—sorry that this is currently a hassle.

It's a good piece of feedback for how we can improve the default workbox-webpack-plugin behavior. (I keep mentioning @developit here in case he wants to chime in...)

@OliverJAsh
Copy link
Author

OliverJAsh commented Feb 4, 2019

Got this working! 🎉

Two other things I had to do:

  • prepend the webpack publicPath
  • specify globDirectory as the webpack outputPath
const flatten = require('lodash/flatten');
const path = require('path');
const { injectManifest } = require('workbox-build');
const compilationStats = require('./stats.json');

const chunksToPrecache = ['main'];
const globPatterns = flatten(
  chunksToPrecache.map(chunkName => compilationStats.namedChunkGroups[chunkName].assets),
);

const { outputPath } = compilationStats;
const globDirectory = outputPath;
const swSrc = 'CHANGEME';
const swDest = 'CHANGEME';

const { publicPath } = compilationStats;
const prependPublicPathToManifestEntries = originalManifest => {
  const manifest = originalManifest.map(entry => ({
    ...entry,
    url: path.join(publicPath, entry.url),
  }));
  return { manifest };
};
const manifestTransforms = [prependPublicPathToManifestEntries];

injectManifest({
  swSrc,
  swDest,
  globPatterns,
  manifestTransforms,
}).then(({ count, size }) => {
  console.log(`Generated ${swDest}, which will precache ${count} files, totaling ${size} bytes.`);
});

@jeffposnick
Copy link
Contributor

This should be addressed by the current Workbox v5.0.0 alpha.

@jeffposnick
Copy link
Contributor

@OliverJAsh—Just wanted to confirm that you've had a chance to test workbox-webpack-plugin@next and that it resolves the issue for you. We're planning on moving to release candidates soon, and if it's still broken, I'd like to get it addressed before the final release.

@OliverJAsh
Copy link
Author

Hey! Yup it fixes it. Test case here. Thank you. 😃

@khmyznikov
Copy link

With minChunks: 3, webpack 4 generated default~ and vendors~ chunks. InjectManifest doesn't see default chunk, even with all js included.

@khmyznikov
Copy link

OMG this OPTIONAL parameter maximumFileSizeToCacheInBytes just ruined few hours of my life because I have 2.2mb raw common chunk and it's doesn't included until I have to set this OPTIONAL .

@jeffposnick
Copy link
Contributor

(I'm definitely sorry about this frustration—the warnings return value is populated with details when maximumFileSizeToCacheInBytes ends up excluded a file, but that's obviously possible to overlook.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants