Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webpack build's [contenthash] diverges from build to build #17757

Open
abirmingham opened this issue Oct 19, 2023 · 12 comments
Open

Webpack build's [contenthash] diverges from build to build #17757

abirmingham opened this issue Oct 19, 2023 · 12 comments
Labels

Comments

@abirmingham
Copy link

abirmingham commented Oct 19, 2023

Bug report

What is the current behavior?

I am noticing that the contenthash on my production builds is not consistent from build to build. A few observations:

  1. Sometimes contenthash is consistent for 2 builds, and other times it is consistent for 5, and so on and so forth. Typically the first two builds are inconsistent with one another, and subsequent builds are consistent, but this is not always the case.
  2. The offending files typically appear in a chunk tree that is imported by workers, i.e. new Worker(new URL('...', import.meta.url));.
  3. I attempted to reproduce the issue without thread-loader, babel-loader, and mode="production", but I was unable to do so. I was also unable to reproduce the issue with a significantly less complex source tree. That is not to say that the issue cannot be produced under these conditions, only that 30 rebuilds did not reproduce the issue for me.
  4. The offending files contain (or import a tree which contains) minified variable names which differ. This appears to be the only difference.

Since the build is not deterministic, my production builds are not reproducible, and the contenthash breaks caching.

Additionally, in this case it appears that the output is functionally identical, but in another case my webpack configuration had a bug (introduced by me) that created functional differences in the build output. Because an inconsistent contenthash is often produced, the the bug I introduced was harder to detect, as I could not simply diff filenames as a sanity check.

I have created a repository with a minimal reproduction - https://github.com/abirmingham/repro-webpack-issue-17757. In my experience the issue is typically reproduced within 5 builds, but sometimes more are required.

Here is an example diff of an offending file:
image

What is the expected behavior?

If no build inputs have changed, all filenames and file content should be identical from build to build.

Other relevant information:
webpack version: 5.89.0
Node.js version: 18.18.2
Operating System: Ubuntu 20.04.6 LTS running docker "node:18" image
Additional tools: None

@jayaddison
Copy link

I've been able to replicate this issue (using buildah + podman in place of docker) after nine builds of the repro application code.

Do you know whether this is a regression @abirmingham? (I'm planning to test v5.88.2 - the version I'm using for a work project at the moment - next to see whether the same behaviour occurs)

@jayaddison
Copy link

Yep, this was also replicable using v5.88.2 after three iterations - I haven't tested any earlier versions of webpack.

@abirmingham
Copy link
Author

Hey James, thanks for taking a peek at this. I haven't tested any other versions with the minimal repro, but in my main project I am using webpack@5.70.0 and am seeing issues. I hesitate to confirm that it is the same issue, however, as my main project is quite a bit more complex and the nondeterminism may be coming from other places. All of that is to say that I don't know if it is a regression or not :)

@jayaddison
Copy link

You're welcome. Issue #17009 sounds potentially related.

I notice that the list of entrypoints generated in the webpack config seems deterministic (generated from numbers 0..6) - is there a reason to use that many input files? The contents of each code file seem to be the same.

@abirmingham
Copy link
Author

@jayaddison good question! While attempting to reproduce the issue, I noticed that projects under a certain size took a lot longer to reproduce the problem, or were unable to reproduce the problem at all. I wanted a short feedback loop, so I continued adding entrypoints until the problem was reliably reproduced in under 10 iterations. The same is true of the TS files in the autogenerated folder.

@jayaddison
Copy link

Could you try removing more of the webpack config, JavaScript code in the modules, etc to continue to narrow this down without affecting the repeatability of it? (I will try to do the same soon too, but not sure when I'll get around to it)

It'll probably be a slightly annoying process, but I think we should be able to continue to remove items until -- ideally -- there's a removal where we cross some threshold and the problem stops occurring.

Then there'll be the equally tricky part of debugging from there to determine why whatever small difference that is causes the problem in the first place.

@jayaddison
Copy link

Note: I briefly wondered whether realContentHash could be relevant here; the documentation does mention that when it is set to false, content hashes may not be deterministic. However: the default for that setting is to match production flag -- and that's set true in the repro webpack.config.js.

@jayaddison
Copy link

A few more observations:

  • Removing the threadloader from the repro configuration seems to produce different results, something that's probably not ideal.
  • From inspection of one of the differing-files using cmp, the difference tends to be in an object whose properties are integers and whose values look like hashes near the very end of the output .js files, just after a special-case ==='monaco' check (maybe not relevant for the investigation, but a useful tracer: this may be due to some webpackChunkName directives). Usually only one of the property values differs.
  • It's possible to move most of the repro.sh contents into the Dockerfile, as long as the webpack.config.js and src directories are also COPY'd into the layers during build. This is something I'm going to reduce size (albeit not really time, yet) of the iteration loop.

@jayaddison
Copy link

I've confirmed that this variance is replicable as far back as webpack=5.56.1. It may be replicable in earlier versions too, but that's the least-recent version I've tested so far (I was squinting at 3fa83c6 - but reproducing the issue on v5.56.1 predates that change, so I think it is unrelated).

So far it does seem that the thread-loader component makes this issue more repeatable. Whether it is definitively the cause, however, I cannot yet confirm/reject.

@abirmingham
Copy link
Author

Hey @jayaddison - I was able to spend more time on this, and attempted the following changes:

1

  • Changes:
    -- Dropped thread-loader to 2 workers, 2 workerParallelJobs
    -- Dropped entrypoints from Array(7) to Array(3)
    -- Load CPU: for i in 1 2 3 4; do while : ; do : ; done & done
  • Results:
    -- Reproduced on second build

2

  • Changes:
    -- Dropped entrypoints from Array(3) to Array(1)
  • Results:
    -- Reproduced on 8th build

3

  • Changes:
    -- Removed all extra entrypoints
  • Results:
    -- 13 builds failed to reproduce

4

  • Changes:
    -- Reverted change # 3
    -- Removed all references to autogenerated directory
  • Results:
    -- 50 builds failed to reproduce

5

Here is the commit containing my stopping point, which can be seen in the repro repository on branch 10_23_2023_minimal_build:

commit 21c561c1483963a1069943e5b351190faa990d89 (HEAD -> main, abirmingham-github/10_23_2023_minimal_build)
Author: Alex Birmingham <abirmingham@extrahop.com>
Date:   Mon Oct 23 15:39:51 2023 -0700

    Attempt More Minimal Reproduction
    
    - Delete half of autogenerated files
    - Reduce entrypoints from 8 to 2
    - Reduce thread-loader workers from 4 to 2
    - Reduce thread-loader workerParallelJobs from 3 to 

Note that in the final iteration the bug was only reproduced after the 21st build, and in a prior iteration I stopped running builds at 13 builds. So it is certainly possible that the bug would have been reproduced if builds had continued.

This is a tricky one. It's unclear to me whether the extra complexity is necessary or is simply helpful in reproducing the issue in a less prohibitive amount of time.

Thank you for your time!

@jayaddison
Copy link

Thanks - yep, more clues. I do have a feeling that the autogenerated content -- and potentially the stack/module depth of it, the way that the modules are connected -- could be relevant. And the entrypoints seem relevant, as you've confirmed. At the moment I do think threading could be a relevant interaction too. Perhaps it requires all three of those in combination, but we'll need more info to prove or disprove that.

@webpack-bot
Copy link
Contributor

This issue had no activity for at least three months.

It's subject to automatic issue closing if there is no activity in the next 15 days.

@webpack webpack deleted a comment Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Priority - High
Development

No branches or pull requests

4 participants