New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds on Linux #3043
Comments
@joyeecheung do you have any thoughts/suggestions on the snapshot front? @sxa I assume that it should be possible to update |
That would be my assumption yes. The previous issues and PRs suggest this was resolved, so we'd just need to understand what happened and potentially reimplement a fix for it. |
Pinging @warpfork for this thread. He's currently heads-down on building Warpforge and has a passion for reproducible builds and I was telling him that there's interest in making progress on this front for Node.js so there might be some potential for collaboration and the production of some tooling to both generate reproducible builds and also document & provide mechanisms for people to do it for themselves without all the pain that usually comes from doing it manually. So @sxa, meet the most excellent and clever @warpfork; maybe you two should chat. |
Thanks @rvagg - I've worked on reproducible build environments for another project, and the Node.js build is close other than those two issues above, so it should just require a little bit of knowledge about the Node processes involved in producing those two things to be able to resolve it, then we can possibly put some more robust things in place for making our builds reproducible by default, which I think should be the aim here. @warpfork DO you have much experience on non-Linux reproducible build work (I've so far only looked at Linux for Node.js) |
One source of indeterminism is here: Instead of a set a dict can be used here (with |
Is there a todo on this issue or can it be closed? |
Definitely still work to do and I still plan to look at it. |
Just tried four consecutive builds with the latest code and the This was tested with With snapshots enabled the builds are still non-reproducible. due to |
Looks like the break was between these two commits, so nodejs/node@1faf6f459f likely made it non-reproducible again.
FYI @joyeecheung (PR) @bnoordhuis (PR) I don't think I have enough knowledge of this code to be able to propose a safe solution here so looking for advice. |
@joyeecheung Would you be able to assist with the reproducibility of the snapshots now? I'm not sure who else might have good knowledge of the snapshot support in Node so anyone else might have to start from scratch. |
Thanks for the ping, not sure how I missed the earlier one. I diff'ed the generated snapshot locally and found ~20 lines of differences in the snapshot.cc generated (with |
With this patch the snapshot.cc difference is down to 13 lines. Still need to figure out the differences in the blob though. see diffdiff --git a/src/cleanup_queue-inl.h b/src/cleanup_queue-inl.h
index 5d9a56e6b0..d1fbd8241d 100644
--- a/src/cleanup_queue-inl.h
+++ b/src/cleanup_queue-inl.h
@@ -39,7 +39,9 @@ void CleanupQueue::Remove(Callback cb, void* arg) {
template <typename T>
void CleanupQueue::ForEachBaseObject(T&& iterator) const {
- for (const auto& hook : cleanup_hooks_) {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
+ for (const auto& hook : callbacks) {
BaseObject* obj = GetBaseObject(hook);
if (obj != nullptr) iterator(obj);
}
diff --git a/src/cleanup_queue.cc b/src/cleanup_queue.cc
index 6290b6796c..c0fcda2fac 100644
--- a/src/cleanup_queue.cc
+++ b/src/cleanup_queue.cc
@@ -5,7 +5,7 @@
namespace node {
-void CleanupQueue::Drain() {
+std::vector<CleanupQueue::CleanupHookCallback> CleanupQueue::GetOrdered() const {
// Copy into a vector, since we can't sort an unordered_set in-place.
std::vector<CleanupHookCallback> callbacks(cleanup_hooks_.begin(),
cleanup_hooks_.end());
@@ -20,6 +20,12 @@ void CleanupQueue::Drain() {
return a.insertion_order_counter_ > b.insertion_order_counter_;
});
+ return callbacks;
+}
+
+void CleanupQueue::Drain() {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
for (const CleanupHookCallback& cb : callbacks) {
if (cleanup_hooks_.count(cb) == 0) {
// This hook was removed from the `cleanup_hooks_` set during another
diff --git a/src/cleanup_queue.h b/src/cleanup_queue.h
index 64e04e1856..2ca333aca8 100644
--- a/src/cleanup_queue.h
+++ b/src/cleanup_queue.h
@@ -6,6 +6,7 @@
#include <cstddef>
#include <cstdint>
#include <unordered_set>
+#include <vector>
#include "memory_tracker.h"
@@ -66,6 +67,7 @@ class CleanupQueue : public MemoryRetainer {
uint64_t insertion_order_counter_;
};
+ std::vector<CleanupHookCallback> GetOrdered() const;
inline BaseObject* GetBaseObject(const CleanupHookCallback& callback) const;
// Use an unordered_set, so that we have efficient insertion and removal.
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index ecc295acdb..2ba6878a28 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -52,6 +52,7 @@ int main(int argc, char* argv[]) {
#endif // _WIN32
v8::V8::SetFlagsFromString("--random_seed=42");
+ v8::V8::SetFlagsFromString("--predictable");
v8::V8::SetFlagsFromString("--harmony-import-assertions");
return BuildSnapshot(argc, argv);
} |
Sounds like good progress - thanks @joyeecheung! |
Checking the snapshots again, another source of indeterminism comes from performance data (milestones, time origin etc.), the easiest way to fix it is probably discarding it before snapshot generation. But I'll need to check if/how they should be synchronized. |
With nodejs/node#48702 and nodejs/node#48708 and --predictable the differences are down to 8 places:
EDIT: we can just return some non-empty data for both BaseObject slots to fix 1. 2 probably need a V8 patch for us to customize how the slots should be serialized. Trying to work out a prototype. |
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: #48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
I've put https://ci.nodejs.org/job/reproducibility-test/ in place to test how reproducible the builds are on Linux/aarch64 - it will run weekly:
|
Thanks, I think when nodejs/node#48851 lands, we could also add another flag to display the |
Yeah that's probably more useful for the future :-) |
I am adding a flag to make the diff output more readable (it would easier to calculate offsets of the differences in the data): nodejs/node#49312 |
With the flag the diff output is currently 9c9
< static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,52,90,-119,-9,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0
---
> static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,-125,90,-8,1,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0
13318c13318
< 16,75,98,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,3,-39,3,-128,93,68,102,0,0,0,0,0,0,0,0,49,72,-16,-115,-31,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 13309
---
> 16,75,98,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,3,-39,3,-128,93,68,102,0,0,0,0,0,0,0,0,-55,17,-16,-115,-31,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 13309
13321c13321
< 0,0,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,-108,-128,93,68,102,0,0,0,0,0,0,0,0,-31,100, // 13312
---
> 0,0,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,-108,-128,93,68,102,0,0,0,0,0,0,0,0,73,98, // 13312
13963c13963
< -63,49,-89,112,47,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 13954
---
> -71,6,-16,-115,-31,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 13954
15751c15751
< 0,0,0,12,56,68,96,0,-66,2,56,1,0,0,0,68,71,68,71,98,0,0,0,0,0,0,0,0,-128,125,112,55,1,0,0,0,-28,80,-13,5,1,0,0,0,68,1,28,3,76,-128,93,5,-7,8,-107,6,-51,11,8,-15,9,1,28,-105, // 15742
---
> 0,0,0,12,56,68,96,0,-66,2,67,1,0,0,0,68,71,68,71,98,0,0,0,0,0,0,0,0,-128,125,112,66,1,0,0,0,36,-72,-59,3,1,0,0,0,68,1,28,3,76,-128,93,5,-7,8,-107,6,-51,11,8,-15,9,1,28,-105, // 15742 With nodejs/node#48749 it's down to 2 differences (technically just one, the first one is the checksum being different), which comes from the Environment pointer in the context embedder data. I am still working on a V8 API to address this (we could also do a dumb workaround by setting it to nullptr temporarily before serialization, though) 9c9
< static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,-16,89,88,-80,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0
---
> static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,45,91,5,18,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0
15751c15751
< 0,0,0,12,56,68,96,0,68,-127,81,1,0,0,0,68,71,68,71,98,0,0,0,0,0,0,0,0,0,105,96,81,1,0,0,0,100,-112,39,2,1,0,0,0,68,1,28,3,76,-128,93,5,-7,8,-107,6,-51,11,8,-15,9,1,28,-105, // 15742
---
> 0,0,0,12,56,68,96,0,-118,3,29,1,0,0,0,68,71,68,71,98,0,0,0,0,0,0,0,0,-16,-98,-32,28,1,0,0,0,100,80,-100,6,1,0,0,0,68,1,28,3,76,-128,93,5,-7,8,-107,6,-51,11,8,-15,9,1,28,-105, // 15742 |
This makes it easier to locate indeterminism in the snapshot, with the following command: $ ./configure --write-snapshot-as-array-literals $ make V= $ mv out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc $ make V= $ diff out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc PR-URL: #49312 Refs: nodejs/build#3043 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
FYI, just landed the --write-snapshot-as-array-literals flag to the configure script |
We previously only return startup data for the first slot for BaseObjects because we can already serialize all the necessary information in one go, but slots that do not get special startup data would be serialized verbatim which means that the pointer addresses are going to be part of the snapshot blob, resulting in indeterminism. This patch updates the serialization routines and capture information for both of the two slots - the first slot with type information and memory management type (which we can use in the future for cppgc-managed objects) and the second slot with data about the object itself. This way the embeedder slots can be serialized in a reproducible manner in the snapshot. PR-URL: #48996 Refs: nodejs/build#3043 Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
This makes it easier to locate indeterminism in the snapshot, with the following command: $ ./configure --write-snapshot-as-array-literals $ make V= $ mv out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc $ make V= $ diff out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc PR-URL: #49312 Refs: nodejs/build#3043 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: #48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
This makes it easier to locate indeterminism in the snapshot, with the following command: $ ./configure --write-snapshot-as-array-literals $ make V= $ mv out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc $ make V= $ diff out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc PR-URL: nodejs#49312 Refs: nodejs/build#3043 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: nodejs#48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: #48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
With nodejs/node#50983 locally I can generate reproducible snapshots, though the binary still differ |
I have been trying and failing to get any version of node to build bit-for-bit reproducible. What is the most recent version of node anyone has managed to reproduce, with what exact build steps? Most recently I attempted to build 20.11.1 as follows:
Here is just a portion of the massive diff that I get:
|
I only succeeded with |
I had no luck with that. What version? |
that was 18.18.2 |
Without nodejs/node#50983 The snapshot would not be reproducible. The V8 patch just landed last week in the upstream, I plan to backport it soon once I fix a small regression the patch caused. |
Just tried 18.18.2 with:
Here is the full Containerfile if anyone wants to test.
Still can't get a match, though I would note I am building against musl rather than glibc as with nix.
|
Saw some diffs were ICU related so I also just tried with |
I would suggest that with all of the configure options you're using (plus using a musl instead of glibc) it may take a bit of work to identify the source of your problems from this. My tests were on Fedora 39 with gcc 13.2.1 but I've used earlier compilers when doing this in the past with the same results, and I have a CI job that does a regular check of the tip with gcc 9.3.0 on Ubuntu 20.04 which still seems to be working ok. 18.18.2 didn't produce identical builds for me, although
That's great news @joyeecheung! Sounds like we're really close now. |
Okay so when I build 20.11.1 twice with only
Will try again with latest tip and see if that one difference is corrected and share if not. I was able to conclude the big diffs I provided above, were only from cases where I am doing parallel builds with identical containerized build environments except for the CPU. In one case I am building with 20 cores of a Threadripper 2990WX 32 core, and in another I was building with ~all 32 cores of a AMD EPYC 7502P 32 core. In stagex (the deterministic Linux distro we are trying to add nodejs to) we have seen this behavior a few times in other packages, and was always because the compilation system was either making optimizations based on the CPU sub-architecture, or embedding information about the CPU or build environment into early headers. In either case we end up with a huge shift in headers as per my diffs above. Anyone more familiar with nodejs building have ideas on how I can maybe hardcode away any CPU-specific behavior or metadata beyond "generic x86_64"? |
Yeah I get different output on different CPU types too (Most notably in the |
Interesting. My biggest differences above also all seemed ICU related. Did you try comparing with using system icu? |
No I haven't tried that |
I landed the regression fix for my V8 patch, looking into backporting the V8 patches and rebasing nodejs/node#50983 - with the patches the snapshot part should be reproducible, though it seems there are some bits beyond snapshots that are not reproducible. (It seems #3043 (comment) matches my findings in #3043 (comment), I was using Ubuntu (don't remember the version now)) |
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: nodejs#48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
In the dim and distant past before global lockdowns there was some previous work to make the build process binary reproducible.
I had a play recently with the process (on Linux/aarch64 because they are the quickest machines I have access to) and I hit two issues:
debug-support.cc
is nondeterministic - the definitions are not always in the same order within the file. It is generated by https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py - If I copy in a consistent one the builds can be reproducible.configure --without-node-snapshot
), which suggests that the changes made in node_code_cache.cc and node_snapshot.cc generation is unreproducible node#29108 are no longer valid for the current release.Tagging @ChALkeR who put the original PR in (albeit three years ago!)
The text was updated successfully, but these errors were encountered: