harden prov args #49

laurentsimon · 2022-02-18T20:19:23Z

Today, we build https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L204 and create the provenance https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L221 in the same job.

Now consider what happens if the compiler is passed a command that can print: this command echo "::set-output name=compiler-command::..." could be overwritten. We want to protect against this by separating the jobs. This is particularly needed if we output the compiler command that was run in the provenance, since the command is printed by the builder; and the compiler commands may overwrite the print in https://github.com/asraa/slsa-on-github/blob/main/build-go/pkg/build.go#L88

Within the step, it's possible to print multiple time so we should always have the print as the last command. Within a job, different steps can print but that's fine because the output we use to retrieve the value from other jobs is done by accessing job.step.id

If the compiler is tricked into running arbitrary commands, then it could backdoor the VM and hijack the last print which contains the hash computation: this is fine because with arbitrary commands, the attacker can backdoor the final binary anyway. This does not violate the claims that "no additional stuff was included in the final binary besides what's defined in the repo"

If the compiler is tricked into overwriting the binary's hash but not the binary itself, it will be caught in the next job by verifying the hash.

According to https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts
Downloading files: You can only download artifacts that were uploaded during the same workflow run. When you download a file, you can reference it by name.

and the API https://docs.github.com/en/rest/reference/actions#artifacts

The Artifacts API allows you to download, delete, and retrieve information about workflow artifacts.

meaning that we are guaranteed (today) that the artifacts was uploaded during the same workflow run. This further reduces the attack surface. It seems that the hash we use today for integrity are not strictly needed then.. but they are probably good defense in depth

The text was updated successfully, but these errors were encountered:

laurentsimon · 2022-02-18T20:39:24Z

update after sync'ing with @asraa : hashes are actually needed because other jobs from the calling workflow can upload artifacts too.

laurentsimon · 2022-02-18T22:38:21Z

it's impossible to protect the hash calculation step if the step runs in the same job as the build. I'll see if we can use the job.outputs to pass the entire binary as base64/hex instead. If this works well enough, we can think doing the same for other data, i.e. not rely on the artifact registry

laurentsimon · 2022-02-19T03:34:30Z

does not work Job outputs (2659252 bytes) has exceeded maximum size 1048576 bytes.

laurentsimon · 2022-02-19T04:43:45Z

update: I've separated the part that prints compiler arguments (including filename) in a different job. This part works.

It is possible to store the entire binary in the logs, there's no limit size for logs it seems. But a rogue compiler would be able to patch some files and subvert the output too.

The more I think about it., the more I think that's maybe not part of the threat model... :/

For example, python allows running script pre-packaging: in this case we have to trust the script anyway. This seems to indicate we cannot provide guarantees unless the script are not out right malicious.

I think the only way for additional protection here is to "sandbox" the compilation ourselves and limit fs, network access in order to share only a thin interface with the compiler (docker container would be an amazing start). This way we'd have more confidence in the hash we output in the provenance.

I wonder if we can use a dockerfile action for it (https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action) it does not seem to be the best option because it still allows printing messages, etc

laurentsimon · 2022-02-19T04:48:54Z

Fyi, I tested uploading an artifact from another job, and uploading a second time still works. So we do need the other hash and we cannot rely on immutability of the uploaded artifacts.

asraa · 2022-02-22T16:04:32Z

So we do need the other hash and we cannot rely on immutability of the uploaded artifacts.

Similar to this (and I'll review the PR in case it fits easily), can we achieve a "sandbox" around the binary upload by deleting any upload with the binary name that comes from the dry-run?

laurentsimon · 2022-02-22T16:08:31Z

yes, I think that will be covered in my last PR: I'm going to move the upload to the last step where we upload the provenance. (slipped thru the cracks last Friday) Glad that you caught it!

laurentsimon self-assigned this Feb 18, 2022

laurentsimon mentioned this issue Feb 19, 2022

✨ Separate argument generation and builder #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden prov args #49

harden prov args #49

laurentsimon commented Feb 18, 2022 •

edited

laurentsimon commented Feb 18, 2022 •

edited

laurentsimon commented Feb 18, 2022

laurentsimon commented Feb 19, 2022

laurentsimon commented Feb 19, 2022 •

edited

laurentsimon commented Feb 19, 2022

asraa commented Feb 22, 2022

laurentsimon commented Feb 22, 2022

harden prov args #49

harden prov args #49

Comments

laurentsimon commented Feb 18, 2022 • edited

laurentsimon commented Feb 18, 2022 • edited

laurentsimon commented Feb 18, 2022

laurentsimon commented Feb 19, 2022

laurentsimon commented Feb 19, 2022 • edited

laurentsimon commented Feb 19, 2022

asraa commented Feb 22, 2022

laurentsimon commented Feb 22, 2022

laurentsimon commented Feb 18, 2022 •

edited

laurentsimon commented Feb 18, 2022 •

edited

laurentsimon commented Feb 19, 2022 •

edited