Chunk hashes should be based on the output #2839

philipwalton · 2019-05-05T22:10:11Z

Expected Behavior / Situation

Adding a minification plugin (which will change a chunk's output contents) should also change the chunk's file hash.

Actual Behavior / Situation

The chunk's hash is unchanged by minification plugins, which can result in lots of subtle bugs when deploying files to production.

For example, consider the following scenario:

A user deploys a web app to production with buggy minification settings (e.g. property mangling that's inconsistent across chunks).
The users discovers the mistake and redeploys, thinking they've fixed the problem.
The bug will persist for any returning users since the chunk filenames won't have changed and the browser will load the old (cached) versions of those files.

Modification Proposal

I see hash determination was already discussed when code splitting was added to rollup, but I think there are some problems with that logic -- specifically this part:

Calculate a content hash for each chunk. This initial hash should only depend on the content of this chunk itself, not on its imports

If the imports are not included in the content hash calculation, then partial redeploys will break.

For example, consider a case where you have chunk A which depends on chunk B, which depends on chunk C. If the contents of C changes, then the hashes of all three chunks needs to change. The reason is because if the contents of C changes then it needs a new filename in order to cache-bust. But if C has a new filename then the contents of B must also change (e.g. the import path) in order to load the correct version of C. And if the contents of B change then it also needs a new filename, and so on and so forth all the way down to A.

As a result, I don't see any feasible way to correctly determine file hashes without considering the entire file, including its imports and their file paths.

My recommendation is Rollup should handle this itself after all plugins that can modify source code have run. If it processes all chunks in reverse topologically sorted order, then it should be able to properly calculate the hashes of all chunks based on their full, final contents (including import statements).

The main problem/issue I see with this approach is how to handle circular dependencies. However, Rollup already warns on circular dependencies, so the simplest solution seems to be just continue to warn but add some additional text that explains that content hashes are non-deterministic in bundles with circular dependencies.

Aside: hopefully this issue won't be as tricky to solve in the future once all browsers support import maps, because once they do all file hashes can just be added to the import map declaration rather than having to appear in the file itself.

The text was updated successfully, but these errors were encountered:

lukastaegert · 2019-05-06T05:00:22Z

If the imports are not included in the content hash calculation, then partial redeploys will break

I am not sure if I followed your point correctly here as imports ARE included in the final hash, just not in the initial content hash. This is why hashing works the way it does and will not take minification into account—otherwise, we would need to be able to change imported file names in minified files, which is really problematic.

Import maps will indeed allow people to solve this issue because then, you do not need Rollup's algorithm at all. Instead, you can just take REAL content hashes of files without their dependencies and add the hashes as query parameters, which would be the perfect solution and enable partial redeploys of individual files.

As for your proposal, that would be a larger refactoring but possible. To handle cycles, one could use the old algorithm to handle these cases and display a warning that hashes may not reflect changes in the renderChunk hook due to this. But this would be a huge refactoring so it would take some time.

philipwalton · 2019-05-06T05:49:30Z

I am not sure if I followed your point correctly here as imports ARE included in the final hash, just not in the initial content hash.

I see. I didn't realize there was an initial hash and then a final hash. I can look into how that can affect things, but I still believe the final hash should include the entire file contents (including import paths) otherwise it can lead to the bugs I described above.

This is why hashing works the way it does and will not take minification into account—otherwise, we would need to be able to change imported file names in minified files, which is really problematic.

Can you explain why this is problematic? I think it's problematic for hashing to not take minification into account, and I've experienced actually bugs as a result of this (hence opening this issue).

As for your proposal, that would be a larger refactoring but possible. To handle cycles, one could use the old algorithm to handle these cases and display a warning that hashes may not reflect changes in the renderChunk hook due to this. But this would be a huge refactoring so it would take some time.

Understandable. I definitely realize these things take time, I just wanted to point out that minification bugs can definitely lead to issues when redeploying if minification itself doesn't affect the chunk hashes.

lukastaegert · 2019-05-06T06:29:54Z

Can you explain why this is problematic?

Basically minification is just one of many things that can take place in the renderChunk hook which basically enables plugins to freely modify a chunk. Plugins can use it to add or remove imports or entirely change its contents, e.g. transform it to a binary format. There is no way of safely changing import file names after renderChunk.

As for the current hashing, the generated hashes take as much into account as possible except changes in the renderChunk hook. This includes the hashes of all dependencies, the rendered exports including their names and order and anything added by intro/outro/banner/footer hooks. And it works well with circular dependencies.

lukastaegert · 2019-05-06T06:32:52Z

Of course this stresses even more the value it would have to be able to do hashing after renderChunk instead of before. But I believe combining both approaches where the old approach serves as a fallback to handle cycles and displays a warning might be a good way forward.

shellscape · 2019-08-15T13:30:47Z

@lukastaegert @isidrok does #2921 address this?

isidrok · 2019-08-15T15:10:10Z

It gives plugin authors the ability to reflect changes during renderChunk into chunk hashes so mostly yes.
For example a minification plugin could augment chunk hashes using the minify options and the minifier version. Rollup wont hash the final output but the minification changes would indeed be somehow in the final hash thus invalidating it if the minification options change.

lazka · 2019-08-19T16:13:42Z

maybe related: #3060

jakearchibald · 2020-03-10T15:01:00Z

A same-same-but-different issue:

If source files contain /* Super amazing copyright 2019 */, and that's changed to 2020, this will cause Rollup to change the hash of all files, even if all comments are removed from the source via minification.

seivan · 2020-06-03T09:49:09Z

Edit: Sorry should have read everything before:
I guess it's tackled here https://github.com/WICG/import-maps#the-basic-idea

@jakearchibald I think the problem with caching is deeper than that and mostly related to how ES modules work today.

Assume the listed names here are using content has hashes as suffixes
Say you got app.js & react.js.
app.js imports react.js.

The code would look like

//app.js
import * as React from "./react_1.js`
 const CustomComponent = () => return <h1>hey</h1>
//render, etc.

Say now you update React, which will change its name and thus busting the cache for react.js since it will now have new hash.
However it also busts app.js even though that didn't technically change.

//app.js
import * as React from "./react_2.js`
 const CustomComponent = () => return <h1>hey</h1>
//render, etc.

Essentially it will have a cascading effect when using ES Modules which makes me question if it's worth using seperate modules at all instead of one large one.
Obviously you can try to work out E-tags, but it's extra configuration on whatever server/cdn you're using.

isidrok · 2020-06-05T06:57:50Z

@seivan what you are talking about is cascading cache invalidation which is indeed a pity but will always be better than having a single bundle since you would have to invalidate its cache every time there is a change. Here is a really nice article on how to avoid it (I'm personally using import-maps with Systemjs).

This issue is about calculating output hashes based on the final output, preferably only taking into account executable code.

frank-dspeed · 2020-07-10T14:57:38Z

i am working on a plugin to solve that https://github.com/direktspeed/plugins/tree/plugin-content-hash

@rollup/plugin-content-hash

Example

rollup.config.js

import {createHash} from 'crypto';

const contentHash = (getHash)=>{
    return {
        name: 'content-hash',
        generateBundle: function(options = {}, bundle = {}, isWrite = false) {
            if (!isWrite) {
                return 
            }
            const updateBundle = (key,value) => {
                if (!value.code) {
                    //Maybe add asset support later
                    return;
                }

                const currentHash = getHash(key)
                if (currentHash === value.fileName) {
                    return;
                }
                   
                const newKey = key.replace(currentHash,createHash('sha256').update(value.code).digest('hex').substring(0,10))
                console.log(currentHash,key,newKey)
                
                value.fileName = newKey
                //TODO: if file exists we would throw this out of the bundle! no need to rewrite that file
                bundle[newKey] = value
                delete bundle[key]
                Object.values(bundle).map(currentValue=>{
                    if (currentValue.imports.includes(key)) {
                        currentValue.imports = [...currentValue.imports.filter(x => x !== key),newKey]
                    }
                    if (currentValue.dynamicImports.includes(key)) {
                        currentValue.dynamicImports = [...currentValue.dynamicImports.filter(x => x !== key),newKey]
                    }
                    if (currentValue.implicitlyLoadedBefore.includes(key)) {
                        currentValue.implicitlyLoadedBefore = [...currentValue.implicitlyLoadedBefore.filter(x => x !== key),newKey]
                    }
                    if (currentValue.code.indexOf(key) > -1) {
                        currentValue.code = currentValue.code.split(key).join(newKey)
                        updateBundle(currentValue.fileName,currentValue)
                    }
                })
            }

            for (const [key, value] of Object.entries(bundle)) {
                // only works with none asset chunks assets have source
                updateBundle(key,value)
            }

            console.log(bundle)
        }
    }
}

export default  {
    input: 'main.js',
    output: {
        dir: 'dist',
        chunkFileNames: '[name]-[hash].js',
        format: 'systemjs'
    },
    plugins: [contentHash(fileName=>{
        const split = fileName.split('-');
        const hasHash = split.length > 1

        return hasHash ? split.pop().split('.')[0] : fileName;
    })]
}

it works at present i think this will solve it when it is a bit more designed
@jakearchibald

jakearchibald · 2020-07-10T15:33:36Z

Unless I missed something on a skim-read, that's still relying on Rollup to do the right thing with regard to hash cascading right? If Rollup gets it wrong, your result will also be wrong.

frank-dspeed · 2020-07-10T16:13:15Z

@jakearchibald this simply does consistent hashing based on the final file content after minifying and everything else.

frank-dspeed · 2020-07-10T16:16:44Z

@jakearchibald

when assets are integrated

jakearchibald · 2020-07-10T16:28:54Z

Hm, maybe we need to clarify on the test a bit. A solution to this problem shouldn't break the hash cascading https://bundlers.tooling.report/hashing/js-entry-cascade/.

frank-dspeed · 2020-07-10T16:35:03Z

@jakearchibald i see that not breaking all hashes will change all will work right.

The JavaScript files produced by rollup have SHAs in their names that are not just generated from their final contents*. Because of this, it is tricky to determine their filename. This proposes a different approach to the file_fingerprint filter, changing it so it gets the SHA'ed filename by regexp rather than using the same hashing as the gulp task. *This issue contains some useful information on how rollup generates its hashes: rollup/rollup#2839

BerndWessels · 2021-08-21T21:11:58Z

wow, I am amazed that this is still an issue - it gave me an hour of head ache until I found this issue here. This is a big show stopper guys, how is this not officially resolved yet?
Will try @frank-dspeed 's solution now, but looking forward for an official fix ;)

mileslane · 2022-04-29T12:32:01Z

Ping

frank-dspeed · 2022-04-29T15:01:57Z

@mileslane @BerndWessels it is on the roadmap for rollup v3+

lukastaegert · 2022-04-30T04:59:25Z

Exactly, I am currently working on this actively on a branch

lukastaegert · 2022-07-02T05:13:54Z

I know it has been a long time, fix at #4543

rollup-bot · 2022-09-23T04:48:04Z

This issue has been resolved via #4543 as part of rollup@3.0.0-7. Note that this is a pre-release, so to test it, you need to install Rollup via npm install rollup@3.0.0-7 or npm install rollup@beta. It will likely become part of a regular release later.

rollup-bot · 2022-10-11T04:34:08Z

This issue has been resolved via #4543 as part of rollup@3.0.0-8. Note that this is a pre-release, so to test it, you need to install Rollup via npm install rollup@3.0.0-8 or npm install rollup@beta. It will likely become part of a regular release later.

rollup-bot · 2022-10-11T13:43:27Z

This issue has been resolved via #4543 as part of rollup@3.0.0. You can test it via npm install rollup.

philipwalton mentioned this issue May 5, 2019

Allow assets to have dependencies (eg images in CSS) #2823

Closed

This comment has been minimized.

Sign in to view

isidrok mentioned this issue Jun 10, 2019

augmentChunkHash plugin hook #2921

Merged

12 tasks

This was referenced Aug 19, 2019

Possible breaking change in minor version, sudden rename of chunk filenames #3060

Closed

Use id of last module in chunk as name base for auto-generated chunks #3025

Merged

jakearchibald mentioned this issue Mar 2, 2020

JS hash changes when src folder moved #3416

Closed

This was referenced Jul 10, 2020

Allow plugins to handle assets #2872

Open

plugin-hash-content direktspeed/plugins#1

Open

lukastaegert mentioned this issue Jul 1, 2022

[v3.0] New hashing algorithm that "fixes (nearly) everything" #4543

Merged

9 tasks

lukastaegert mentioned this issue Aug 30, 2022

[v3.0 Release branch] #4549

Merged

9 tasks

lukastaegert closed this as completed in #4549 Oct 11, 2022

alicewriteswrongs mentioned this issue Nov 8, 2023

bug: Filename fingerprint/hash pointed from app.js (es5) remains same even when content updated ionic-team/stencil#5011

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk hashes should be based on the output #2839

Chunk hashes should be based on the output #2839

philipwalton commented May 5, 2019 •

edited

lukastaegert commented May 6, 2019

philipwalton commented May 6, 2019

lukastaegert commented May 6, 2019

lukastaegert commented May 6, 2019 •

edited

This comment has been minimized.

shellscape commented Aug 15, 2019

isidrok commented Aug 15, 2019

lazka commented Aug 19, 2019

jakearchibald commented Mar 10, 2020

seivan commented Jun 3, 2020 •

edited

isidrok commented Jun 5, 2020

frank-dspeed commented Jul 10, 2020 •

edited

jakearchibald commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020 •

edited

jakearchibald commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020

BerndWessels commented Aug 21, 2021

mileslane commented Apr 29, 2022

frank-dspeed commented Apr 29, 2022 •

edited

lukastaegert commented Apr 30, 2022

lukastaegert commented Jul 2, 2022

rollup-bot commented Sep 23, 2022

rollup-bot commented Oct 11, 2022

rollup-bot commented Oct 11, 2022

Chunk hashes should be based on the output #2839

Chunk hashes should be based on the output #2839

Comments

philipwalton commented May 5, 2019 • edited

Expected Behavior / Situation

Actual Behavior / Situation

Modification Proposal

lukastaegert commented May 6, 2019

philipwalton commented May 6, 2019

lukastaegert commented May 6, 2019

lukastaegert commented May 6, 2019 • edited

This comment has been minimized.

shellscape commented Aug 15, 2019

isidrok commented Aug 15, 2019

lazka commented Aug 19, 2019

jakearchibald commented Mar 10, 2020

seivan commented Jun 3, 2020 • edited

isidrok commented Jun 5, 2020

frank-dspeed commented Jul 10, 2020 • edited

Example

jakearchibald commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020 • edited

jakearchibald commented Jul 10, 2020

frank-dspeed commented Jul 10, 2020

BerndWessels commented Aug 21, 2021

mileslane commented Apr 29, 2022

frank-dspeed commented Apr 29, 2022 • edited

lukastaegert commented Apr 30, 2022

lukastaegert commented Jul 2, 2022

rollup-bot commented Sep 23, 2022

rollup-bot commented Oct 11, 2022

rollup-bot commented Oct 11, 2022

philipwalton commented May 5, 2019 •

edited

lukastaegert commented May 6, 2019 •

edited

seivan commented Jun 3, 2020 •

edited

frank-dspeed commented Jul 10, 2020 •

edited

frank-dspeed commented Jul 10, 2020 •

edited

frank-dspeed commented Apr 29, 2022 •

edited