Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk hashes should be based on the output #2839

Closed
philipwalton opened this issue May 5, 2019 · 25 comments · Fixed by #4543 or #4549
Closed

Chunk hashes should be based on the output #2839

philipwalton opened this issue May 5, 2019 · 25 comments · Fixed by #4543 or #4549

Comments

@philipwalton
Copy link

philipwalton commented May 5, 2019

Expected Behavior / Situation

Adding a minification plugin (which will change a chunk's output contents) should also change the chunk's file hash.

Actual Behavior / Situation

The chunk's hash is unchanged by minification plugins, which can result in lots of subtle bugs when deploying files to production.

For example, consider the following scenario:

  • A user deploys a web app to production with buggy minification settings (e.g. property mangling that's inconsistent across chunks).
  • The users discovers the mistake and redeploys, thinking they've fixed the problem.
  • The bug will persist for any returning users since the chunk filenames won't have changed and the browser will load the old (cached) versions of those files.

Modification Proposal

I see hash determination was already discussed when code splitting was added to rollup, but I think there are some problems with that logic -- specifically this part:

Calculate a content hash for each chunk. This initial hash should only depend on the content of this chunk itself, not on its imports

If the imports are not included in the content hash calculation, then partial redeploys will break.

For example, consider a case where you have chunk A which depends on chunk B, which depends on chunk C. If the contents of C changes, then the hashes of all three chunks needs to change. The reason is because if the contents of C changes then it needs a new filename in order to cache-bust. But if C has a new filename then the contents of B must also change (e.g. the import path) in order to load the correct version of C. And if the contents of B change then it also needs a new filename, and so on and so forth all the way down to A.

As a result, I don't see any feasible way to correctly determine file hashes without considering the entire file, including its imports and their file paths.

My recommendation is Rollup should handle this itself after all plugins that can modify source code have run. If it processes all chunks in reverse topologically sorted order, then it should be able to properly calculate the hashes of all chunks based on their full, final contents (including import statements).

The main problem/issue I see with this approach is how to handle circular dependencies. However, Rollup already warns on circular dependencies, so the simplest solution seems to be just continue to warn but add some additional text that explains that content hashes are non-deterministic in bundles with circular dependencies.


Aside: hopefully this issue won't be as tricky to solve in the future once all browsers support import maps, because once they do all file hashes can just be added to the import map declaration rather than having to appear in the file itself.

@lukastaegert
Copy link
Member

If the imports are not included in the content hash calculation, then partial redeploys will break

I am not sure if I followed your point correctly here as imports ARE included in the final hash, just not in the initial content hash. This is why hashing works the way it does and will not take minification into account—otherwise, we would need to be able to change imported file names in minified files, which is really problematic.

Import maps will indeed allow people to solve this issue because then, you do not need Rollup's algorithm at all. Instead, you can just take REAL content hashes of files without their dependencies and add the hashes as query parameters, which would be the perfect solution and enable partial redeploys of individual files.

As for your proposal, that would be a larger refactoring but possible. To handle cycles, one could use the old algorithm to handle these cases and display a warning that hashes may not reflect changes in the renderChunk hook due to this. But this would be a huge refactoring so it would take some time.

@philipwalton
Copy link
Author

I am not sure if I followed your point correctly here as imports ARE included in the final hash, just not in the initial content hash.

I see. I didn't realize there was an initial hash and then a final hash. I can look into how that can affect things, but I still believe the final hash should include the entire file contents (including import paths) otherwise it can lead to the bugs I described above.

This is why hashing works the way it does and will not take minification into account—otherwise, we would need to be able to change imported file names in minified files, which is really problematic.

Can you explain why this is problematic? I think it's problematic for hashing to not take minification into account, and I've experienced actually bugs as a result of this (hence opening this issue).

As for your proposal, that would be a larger refactoring but possible. To handle cycles, one could use the old algorithm to handle these cases and display a warning that hashes may not reflect changes in the renderChunk hook due to this. But this would be a huge refactoring so it would take some time.

Understandable. I definitely realize these things take time, I just wanted to point out that minification bugs can definitely lead to issues when redeploying if minification itself doesn't affect the chunk hashes.

@lukastaegert
Copy link
Member

Can you explain why this is problematic?

Basically minification is just one of many things that can take place in the renderChunk hook which basically enables plugins to freely modify a chunk. Plugins can use it to add or remove imports or entirely change its contents, e.g. transform it to a binary format. There is no way of safely changing import file names after renderChunk.

As for the current hashing, the generated hashes take as much into account as possible except changes in the renderChunk hook. This includes the hashes of all dependencies, the rendered exports including their names and order and anything added by intro/outro/banner/footer hooks. And it works well with circular dependencies.

@lukastaegert
Copy link
Member

lukastaegert commented May 6, 2019

Of course this stresses even more the value it would have to be able to do hashing after renderChunk instead of before. But I believe combining both approaches where the old approach serves as a fallback to handle cycles and displays a warning might be a good way forward.

@funtikas25

This comment has been minimized.

@shellscape
Copy link
Contributor

@lukastaegert @isidrok does #2921 address this?

@isidrok
Copy link
Contributor

isidrok commented Aug 15, 2019

It gives plugin authors the ability to reflect changes during renderChunk into chunk hashes so mostly yes.
For example a minification plugin could augment chunk hashes using the minify options and the minifier version. Rollup wont hash the final output but the minification changes would indeed be somehow in the final hash thus invalidating it if the minification options change.

@lazka
Copy link

lazka commented Aug 19, 2019

maybe related: #3060

@jakearchibald
Copy link
Contributor

A same-same-but-different issue:

If source files contain /* Super amazing copyright 2019 */, and that's changed to 2020, this will cause Rollup to change the hash of all files, even if all comments are removed from the source via minification.

@seivan
Copy link

seivan commented Jun 3, 2020

Edit: Sorry should have read everything before:
I guess it's tackled here https://github.com/WICG/import-maps#the-basic-idea

@jakearchibald I think the problem with caching is deeper than that and mostly related to how ES modules work today.

Assume the listed names here are using content has hashes as suffixes
Say you got app.js & react.js.
app.js imports react.js.

The code would look like

//app.js
import * as React from "./react_1.js`
 const CustomComponent = () => return <h1>hey</h1>
//render, etc. 

Say now you update React, which will change its name and thus busting the cache for react.js since it will now have new hash.
However it also busts app.js even though that didn't technically change.

//app.js
import * as React from "./react_2.js`
 const CustomComponent = () => return <h1>hey</h1>
//render, etc. 

Essentially it will have a cascading effect when using ES Modules which makes me question if it's worth using seperate modules at all instead of one large one.
Obviously you can try to work out E-tags, but it's extra configuration on whatever server/cdn you're using.

@isidrok
Copy link
Contributor

isidrok commented Jun 5, 2020

@seivan what you are talking about is cascading cache invalidation which is indeed a pity but will always be better than having a single bundle since you would have to invalidate its cache every time there is a change. Here is a really nice article on how to avoid it (I'm personally using import-maps with Systemjs).

This issue is about calculating output hashes based on the final output, preferably only taking into account executable code.

@frank-dspeed
Copy link
Contributor

frank-dspeed commented Jul 10, 2020

i am working on a plugin to solve that https://github.com/direktspeed/plugins/tree/plugin-content-hash

@rollup/plugin-content-hash

Example

rollup.config.js

import {createHash} from 'crypto';

const contentHash = (getHash)=>{
    return {
        name: 'content-hash',
        generateBundle: function(options = {}, bundle = {}, isWrite = false) {
            if (!isWrite) {
                return 
            }
            const updateBundle = (key,value) => {
                if (!value.code) {
                    //Maybe add asset support later
                    return;
                }

                const currentHash = getHash(key)
                if (currentHash === value.fileName) {
                    return;
                }
                   
                const newKey = key.replace(currentHash,createHash('sha256').update(value.code).digest('hex').substring(0,10))
                console.log(currentHash,key,newKey)
                
                value.fileName = newKey
                //TODO: if file exists we would throw this out of the bundle! no need to rewrite that file
                bundle[newKey] = value
                delete bundle[key]
                Object.values(bundle).map(currentValue=>{
                    if (currentValue.imports.includes(key)) {
                        currentValue.imports = [...currentValue.imports.filter(x => x !== key),newKey]
                    }
                    if (currentValue.dynamicImports.includes(key)) {
                        currentValue.dynamicImports = [...currentValue.dynamicImports.filter(x => x !== key),newKey]
                    }
                    if (currentValue.implicitlyLoadedBefore.includes(key)) {
                        currentValue.implicitlyLoadedBefore = [...currentValue.implicitlyLoadedBefore.filter(x => x !== key),newKey]
                    }
                    if (currentValue.code.indexOf(key) > -1) {
                        currentValue.code = currentValue.code.split(key).join(newKey)
                        updateBundle(currentValue.fileName,currentValue)
                    }
                })
            }

            for (const [key, value] of Object.entries(bundle)) {
                // only works with none asset chunks assets have source
                updateBundle(key,value)
            }

            console.log(bundle)
        }
    }
}

export default  {
    input: 'main.js',
    output: {
        dir: 'dist',
        chunkFileNames: '[name]-[hash].js',
        format: 'systemjs'
    },
    plugins: [contentHash(fileName=>{
        const split = fileName.split('-');
        const hasHash = split.length > 1

        return hasHash ? split.pop().split('.')[0] : fileName;
    })]
}

it works at present i think this will solve it when it is a bit more designed
@jakearchibald

@jakearchibald
Copy link
Contributor

Unless I missed something on a skim-read, that's still relying on Rollup to do the right thing with regard to hash cascading right? If Rollup gets it wrong, your result will also be wrong.

@frank-dspeed
Copy link
Contributor

@jakearchibald this simply does consistent hashing based on the final file content after minifying and everything else.

@jakearchibald
Copy link
Contributor

Hm, maybe we need to clarify on the test a bit. A solution to this problem shouldn't break the hash cascading https://bundlers.tooling.report/hashing/js-entry-cascade/.

@frank-dspeed
Copy link
Contributor

@jakearchibald i see that not breaking all hashes will change all will work right.

tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jun 30, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jun 30, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jul 2, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jul 2, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jul 2, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
tombye added a commit to alphagov/emergency-alerts-govuk that referenced this issue Jul 2, 2021
The JavaScript files produced by rollup have SHAs
in their names that are not just generated from
their final contents*. Because of this, it is
tricky to determine their filename.

This proposes a different approach to the
file_fingerprint filter, changing it so it gets
the SHA'ed filename by regexp rather than using
the same hashing as the gulp task.

*This issue contains some useful information on
how rollup generates its hashes:

rollup/rollup#2839
@BerndWessels
Copy link

wow, I am amazed that this is still an issue - it gave me an hour of head ache until I found this issue here. This is a big show stopper guys, how is this not officially resolved yet?
Will try @frank-dspeed 's solution now, but looking forward for an official fix ;)

@mileslane
Copy link

Ping

@frank-dspeed
Copy link
Contributor

frank-dspeed commented Apr 29, 2022

@mileslane @BerndWessels it is on the roadmap for rollup v3+

@lukastaegert
Copy link
Member

Exactly, I am currently working on this actively on a branch

@lukastaegert
Copy link
Member

I know it has been a long time, fix at #4543

@rollup-bot
Copy link
Collaborator

This issue has been resolved via #4543 as part of rollup@3.0.0-7. Note that this is a pre-release, so to test it, you need to install Rollup via npm install rollup@3.0.0-7 or npm install rollup@beta. It will likely become part of a regular release later.

@rollup-bot
Copy link
Collaborator

This issue has been resolved via #4543 as part of rollup@3.0.0-8. Note that this is a pre-release, so to test it, you need to install Rollup via npm install rollup@3.0.0-8 or npm install rollup@beta. It will likely become part of a regular release later.

@rollup-bot
Copy link
Collaborator

This issue has been resolved via #4543 as part of rollup@3.0.0. You can test it via npm install rollup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet