feat: docker input for Bazel Builder and Rebuilder #2602

enteraga6 · 2023-08-08T19:28:13Z

closes #2377
closes #2630

Adds a feature to input a published docker image to build on top of for the Bazel Builder. Building on top of a docker image allows for reproducible build capabilities. There are now two paths for the Bazel Builder to build: using the docker image, and without, which it was doing before. Both utilize the same build script.

To check, the rebuilder will parse the arguments from the provided provenance and use the attest build process to rebuild the artifact. After the rebuild, it will compare the checksums of the provided artifact at command line to verify if the build was successfully reproducible. The rebuilder can handle every type of build that the original Github Actions Bazel Builder can, with different logic for the three main types of artifacts: java targets, targets with runfiles, and targets without runfiles.

The rebuilder does not have to build on a Docker Image. If the docker_image flag is not populated, it will build locally on the machine. If the docker_image flag is populated, but the artifact was not built on a docker image (concurred from the provenance parsing), it will also build locally and a warning will show.

There is also a verbose flag for a user to see their inputs and the arguments that were parsed out of the provenance.

The rebuilder has flags for verifying the artifact and provenance before rebuilding.

Two main repos are cloned: the source repo specifed as an input to the rebuilder via --source_uri, and slsa-verifier if --verify flag is present. There are two main usages, rebuilder only or slsa-verifier + rebuilder, which requires the extra input of the builder_id flag for the verifier.

After completion, the rebuilt artifact will be in a rebuilt artifact directory which has a long random hash at the end to avoid collisions. Something to note is that the entire build process gets repeated since the build arguments are parsed from the provenance. So, if the user builds every target on the GHA, every target will be rebuilt as well, but only the specified target will be copied to the rebuilt artifact directory. If the user only cares about checking the checksums, they can specify the --cleanup flag which will remove the rebuilt artifact dir, the source repo dir, and the slsa-verifier dir, after the rebuilding process is complete.

Documentation on the rebuilder will come in a subsequent PR.

Additionally in this PR, the Bazel Builder workflow has been equipped with more outputs. Added is the sha256 of provenance, and the previously missing name of the binaries directory which will allow users to use this workflow for releases.

Note: I use colorful printf statements that trigger shellcheck. Instead of writing # shellcheck disable=SC2059 over each one I let the warnings persist. I made this decision because they are a lot of printf statements. Also, the warnings are about the environment variables i hardcoded for the colors, not any other environment variable. All other environment variables have been dealt with as SC2059 suggests. SC2059 suggests ignoring the warning with a directive, but that would have cluttered the code.

Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>

laurentsimon · 2023-08-08T21:11:21Z

.github/workflows/builder_bazel_slsa3.yml

@@ -35,6 +35,11 @@ on:
        required: false
        type: string
        default: ""
+      docker-image:


let's not call it docker, because we can build OCI images without docker. Also, we need a digest:
https://github.com/slsa-framework/slsa-github-generator/blob/main/.github/workflows/builder_container-based_slsa3.yml#L62-L74

Changed. Also added input for digest with a todo for verification of it later. TODO #2630

I might have misunderstood you. Did you mean to use the digest to verify the image after pulling?

Like: Pull Image, Get Digest of that Image, Compare with Inputted Digest

or

Like: Pull image in form "${BUILDER_IMAGE}@${BUILDER_DIGEST}", that was taken from later on in the link. IIUC, They pull the image based of the digest.

Which did you mean so I can update the issue that tracks this progress accordingly?

docker pull "${image}@${digest}" should verify the digest

.github/workflows/builder_bazel_slsa3.yml

internal/builders/bazel/action.yml

laurentsimon · 2023-08-08T21:21:53Z

internal/builders/bazel/rebuilder.sh

+#                                              #
+################################################
+
+RESET="\033[0m"


nit: lower case (linter may complain)

The linter won't complain but it's best to use lowercase if it's script-local.

laurentsimon · 2023-08-08T21:22:16Z

internal/builders/bazel/rebuilder.sh

+
+# This directory is where the rebuilt artifacts will be stored. It is made upon
+# running the rebuilder. The long name is to avoid potential collisions.
+rebuilt_artifacts_dir="rebuilt_artifacts_0ffe97cd2693d6608f5a787151950ed8"


why not creating a temp directory instead?

This is created within the user's slsa-github-generator repo. What would be the benefit of the temp directory? I wanted the users to be able to access the rebuilt artifacts and run them, thus why I made this directory within the same directory that they would run the rebuild.sh in.

laurentsimon · 2023-08-08T21:23:42Z

internal/builders/bazel/rebuilder.sh

@@ -0,0 +1,490 @@
+#!/bin/bash
+#
+# Copyright 2023 SLSA Authors


can you separate code that builds provenance, vs code that is utilities (color, etc)? I think we need 2 files.

laurentsimon · 2023-08-08T21:25:12Z

internal/builders/bazel/rebuilder.sh

+  if [[ ! ($returnValue) ]]
+  then
+    my_arg="$ARG"
+    printf "${RED}[ERROR] ${LIGHT_RED}%s is unrecognized${RESET}\n" "$my_arg"


if we want to use colors, we need to create a print_err function which abstracts away that functionality. Otherwise it's really hard to read

+1. Maybe a print_err and print_msg which handle the colors would be nice.

laurentsimon · 2023-08-08T21:26:26Z

internal/builders/bazel/rebuilder.sh

+  cd ../..
+
+  # Now cleanup of verifier and cloned $repo_name.
+  cleanup


@ianlewis how do you feel about this huge script? I'm worried it will be hard to maintain. Shall we use another language?

I'm not against it. 500 lines isn't that long. But I do think it could be simplified depending on how it's used. Do we need all the command line parsing stuff for example?

@enteraga6 I didn't see where this script was executed? Is it done via the bazel workflow somehow?

Don't know why I didn't reply to this thread but replied to review instead..

My response down there was:
@ianlewis This script is not executed in a workflow. It is executed on the local machine of the user.

I will switch script to have utils and source it at beginning.

.github/workflows/builder_bazel_slsa3.yml

internal/builders/bazel/action.yml

internal/builders/bazel/rebuilder.sh

ianlewis · 2023-08-09T00:07:07Z

internal/builders/bazel/rebuilder.sh

+  cd ../..
+
+  # Now cleanup of verifier and cloned $repo_name.
+  cleanup


I'm not against it. 500 lines isn't that long. But I do think it could be simplified depending on how it's used. Do we need all the command line parsing stuff for example?

@enteraga6 I didn't see where this script was executed? Is it done via the bazel workflow somehow?

ianlewis · 2023-08-09T00:09:10Z

Subtweet: https://twitter.com/IanMLewis/status/1689066136535801856

enteraga6 · 2023-08-09T01:03:04Z

@ianlewis This script is not executed in a workflow. It is executed on the local machine of the user.

Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>