Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: docker input for Bazel Builder and Rebuilder #2602

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

enteraga6
Copy link
Collaborator

@enteraga6 enteraga6 commented Aug 8, 2023

closes #2377
closes #2630

/cc: @mihaimaruseac @laurentsimon

Adds a feature to input a published docker image to build on top of for the Bazel Builder. Building on top of a docker image allows for reproducible build capabilities. There are now two paths for the Bazel Builder to build: using the docker image, and without, which it was doing before. Both utilize the same build script.

To check, the rebuilder will parse the arguments from the provided provenance and use the attest build process to rebuild the artifact. After the rebuild, it will compare the checksums of the provided artifact at command line to verify if the build was successfully reproducible. The rebuilder can handle every type of build that the original Github Actions Bazel Builder can, with different logic for the three main types of artifacts: java targets, targets with runfiles, and targets without runfiles.

The rebuilder does not have to build on a Docker Image. If the docker_image flag is not populated, it will build locally on the machine. If the docker_image flag is populated, but the artifact was not built on a docker image (concurred from the provenance parsing), it will also build locally and a warning will show.

There is also a verbose flag for a user to see their inputs and the arguments that were parsed out of the provenance.

The rebuilder has flags for verifying the artifact and provenance before rebuilding.

Two main repos are cloned: the source repo specifed as an input to the rebuilder via --source_uri, and slsa-verifier if --verify flag is present. There are two main usages, rebuilder only or slsa-verifier + rebuilder, which requires the extra input of the builder_id flag for the verifier.

After completion, the rebuilt artifact will be in a rebuilt artifact directory which has a long random hash at the end to avoid collisions. Something to note is that the entire build process gets repeated since the build arguments are parsed from the provenance. So, if the user builds every target on the GHA, every target will be rebuilt as well, but only the specified target will be copied to the rebuilt artifact directory. If the user only cares about checking the checksums, they can specify the --cleanup flag which will remove the rebuilt artifact dir, the source repo dir, and the slsa-verifier dir, after the rebuilding process is complete.

Documentation on the rebuilder will come in a subsequent PR.

Additionally in this PR, the Bazel Builder workflow has been equipped with more outputs. Added is the sha256 of provenance, and the previously missing name of the binaries directory which will allow users to use this workflow for releases.

Note: I use colorful printf statements that trigger shellcheck. Instead of writing # shellcheck disable=SC2059 over each one I let the warnings persist. I made this decision because they are a lot of printf statements. Also, the warnings are about the environment variables i hardcoded for the colors, not any other environment variable. All other environment variables have been dealt with as SC2059 suggests. SC2059 suggests ignoring the warning with a directive, but that would have cluttered the code.

Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
@@ -35,6 +35,11 @@ on:
required: false
type: string
default: ""
docker-image:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not call it docker, because we can build OCI images without docker. Also, we need a digest:
https://github.com/slsa-framework/slsa-github-generator/blob/main/.github/workflows/builder_container-based_slsa3.yml#L62-L74

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Also added input for digest with a todo for verification of it later. TODO #2630

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have misunderstood you. Did you mean to use the digest to verify the image after pulling?

Like: Pull Image, Get Digest of that Image, Compare with Inputted Digest

or

Like: Pull image in form "${BUILDER_IMAGE}@${BUILDER_DIGEST}", that was taken from later on in the link. IIUC, They pull the image based of the digest.

Which did you mean so I can update the issue that tracks this progress accordingly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker pull "${image}@${digest}" should verify the digest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

.github/workflows/builder_bazel_slsa3.yml Outdated Show resolved Hide resolved
.github/workflows/builder_bazel_slsa3.yml Outdated Show resolved Hide resolved
.github/workflows/builder_bazel_slsa3.yml Show resolved Hide resolved
internal/builders/bazel/action.yml Outdated Show resolved Hide resolved
# #
################################################

RESET="\033[0m"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lower case (linter may complain)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter won't complain but it's best to use lowercase if it's script-local.


# This directory is where the rebuilt artifacts will be stored. It is made upon
# running the rebuilder. The long name is to avoid potential collisions.
rebuilt_artifacts_dir="rebuilt_artifacts_0ffe97cd2693d6608f5a787151950ed8"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not creating a temp directory instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is created within the user's slsa-github-generator repo. What would be the benefit of the temp directory? I wanted the users to be able to access the rebuilt artifacts and run them, thus why I made this directory within the same directory that they would run the rebuild.sh in.

@@ -0,0 +1,490 @@
#!/bin/bash
#
# Copyright 2023 SLSA Authors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you separate code that builds provenance, vs code that is utilities (color, etc)? I think we need 2 files.

if [[ ! ($returnValue) ]]
then
my_arg="$ARG"
printf "${RED}[ERROR] ${LIGHT_RED}%s is unrecognized${RESET}\n" "$my_arg"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to use colors, we need to create a print_err function which abstracts away that functionality. Otherwise it's really hard to read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Maybe a print_err and print_msg which handle the colors would be nice.

cd ../..

# Now cleanup of verifier and cloned $repo_name.
cleanup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianlewis how do you feel about this huge script? I'm worried it will be hard to maintain. Shall we use another language?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against it. 500 lines isn't that long. But I do think it could be simplified depending on how it's used. Do we need all the command line parsing stuff for example?

@enteraga6 I didn't see where this script was executed? Is it done via the bazel workflow somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why I didn't reply to this thread but replied to review instead..

My response down there was:
@ianlewis This script is not executed in a workflow. It is executed on the local machine of the user.

I will switch script to have utils and source it at beginning.

.github/workflows/builder_bazel_slsa3.yml Outdated Show resolved Hide resolved
.github/workflows/builder_bazel_slsa3.yml Outdated Show resolved Hide resolved
.github/workflows/builder_bazel_slsa3.yml Show resolved Hide resolved
internal/builders/bazel/action.yml Outdated Show resolved Hide resolved
internal/builders/bazel/action.yml Outdated Show resolved Hide resolved
internal/builders/bazel/rebuilder.sh Outdated Show resolved Hide resolved
internal/builders/bazel/rebuilder.sh Outdated Show resolved Hide resolved
internal/builders/bazel/rebuilder.sh Outdated Show resolved Hide resolved
internal/builders/bazel/rebuilder.sh Outdated Show resolved Hide resolved
cd ../..

# Now cleanup of verifier and cloned $repo_name.
cleanup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against it. 500 lines isn't that long. But I do think it could be simplified depending on how it's used. Do we need all the command line parsing stuff for example?

@enteraga6 I didn't see where this script was executed? Is it done via the bazel workflow somehow?

@ianlewis
Copy link
Member

ianlewis commented Aug 9, 2023

Subtweet: https://twitter.com/IanMLewis/status/1689066136535801856

@enteraga6
Copy link
Collaborator Author

@ianlewis This script is not executed in a workflow. It is executed on the local machine of the user.

enteraga6 and others added 18 commits August 10, 2023 22:51
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Co-authored-by: Ian Lewis <ianlewis@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Co-authored-by: Ian Lewis <ianlewis@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Co-authored-by: Ian Lewis <ianlewis@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
enteraga6 and others added 22 commits August 11, 2023 08:43
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <78953604+enteraga6@users.noreply.github.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
Signed-off-by: Noah Elzner <elzner@google.com>
@@ -35,6 +35,18 @@ on:
required: false
type: string
default: ""
env-image:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's re-use the naming from the container-based builder for consistency, unless there's a good reason not to

value: ${{ fromJSON(jobs.slsa-run.outputs.build-artifacts-outputs).artifacts-download-name }}

artifacts-download-sha256:
description: "SHA256 of the uploaded tarball of built artifacts."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sha256 lower case like in other description

description: "SHA256 of the uploaded tarball of built artifacts."
value: ${{ fromJSON(jobs.slsa-run.outputs.build-artifacts-outputs).artifacts-download-sha256 }}

artifacts-actual-name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's be consistent with the name used by other builders

id: java
uses: actions/setup-java@cd89f46ac9d01407894225f350157564c9c7cee2 # v3.12.0
with:
distribution: "${{ fromJson(inputs.slsa-workflow-inputs).user-java-distribution }}"
java-version: "${{ fromJson(inputs.slsa-workflow-inputs).user-java-version }}"

- name: Check for Environment Image
id: env-image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this step? Why can't we simply use if: ${{ fromJson(inputs.slsa-workflow-inputs).env-image == '' }} in the next step?

shell: bash
run: |
set -euo pipefail
docker pull $UNTRUSTED_ENV_IMAGE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double quote missing

set -euo pipefail
docker pull $UNTRUSTED_ENV_IMAGE
curr_dir=$(basename "$(pwd)")
docker run --rm --env UNTRUSTED_TARGETS=${UNTRUSTED_TARGETS} --env UNTRUSTED_FLAGS=${UNTRUSTED_FLAGS} --env UNTRUSTED_NEEDS_RUNFILES=${UNTRUSTED_NEEDS_RUNFILES} --env UNTRUSTED_INCLUDES_JAVA=${UNTRUSTED_INCLUDES_JAVA} -v $PWD/../:/src -w /src $UNTRUSTED_ENV_IMAGE /bin/sh -c "ls && tree && cd $curr_dir && ls && tree && ./../__TOOL_ACTION_DIR__/build.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double quote missing

shell: bash
run: |
set -euo pipefail
docker pull $UNTRUSTED_ENV_IMAGE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the pull required? Will docker pull automatically as part of docker run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature][bazel] Add verification of image digest in workflow [feat][bazel] Bazel Builder Rebuilder
3 participants