Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden prov args #49

Open
laurentsimon opened this issue Feb 18, 2022 · 7 comments
Open

harden prov args #49

laurentsimon opened this issue Feb 18, 2022 · 7 comments
Assignees

Comments

@laurentsimon
Copy link
Collaborator

laurentsimon commented Feb 18, 2022

Today, we build https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L204 and create the provenance https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L221 in the same job.

Now consider what happens if the compiler is passed a command that can print: this command echo "::set-output name=compiler-command::..." could be overwritten. We want to protect against this by separating the jobs. This is particularly needed if we output the compiler command that was run in the provenance, since the command is printed by the builder; and the compiler commands may overwrite the print in https://github.com/asraa/slsa-on-github/blob/main/build-go/pkg/build.go#L88

Within the step, it's possible to print multiple time so we should always have the print as the last command. Within a job, different steps can print but that's fine because the output we use to retrieve the value from other jobs is done by accessing job.step.id

If the compiler is tricked into running arbitrary commands, then it could backdoor the VM and hijack the last print which contains the hash computation: this is fine because with arbitrary commands, the attacker can backdoor the final binary anyway. This does not violate the claims that "no additional stuff was included in the final binary besides what's defined in the repo"

If the compiler is tricked into overwriting the binary's hash but not the binary itself, it will be caught in the next job by verifying the hash.

According to https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts
Downloading files: You can only download artifacts that were uploaded during the same workflow run. When you download a file, you can reference it by name.

and the API https://docs.github.com/en/rest/reference/actions#artifacts

The Artifacts API allows you to download, delete, and retrieve information about workflow artifacts.

meaning that we are guaranteed (today) that the artifacts was uploaded during the same workflow run. This further reduces the attack surface. It seems that the hash we use today for integrity are not strictly needed then.. but they are probably good defense in depth

@laurentsimon
Copy link
Collaborator Author

laurentsimon commented Feb 18, 2022

update after sync'ing with @asraa : hashes are actually needed because other jobs from the calling workflow can upload artifacts too.

@laurentsimon laurentsimon self-assigned this Feb 18, 2022
@laurentsimon
Copy link
Collaborator Author

it's impossible to protect the hash calculation step if the step runs in the same job as the build. I'll see if we can use the job.outputs to pass the entire binary as base64/hex instead. If this works well enough, we can think doing the same for other data, i.e. not rely on the artifact registry

@laurentsimon
Copy link
Collaborator Author

does not work Job outputs (2659252 bytes) has exceeded maximum size 1048576 bytes.

@laurentsimon
Copy link
Collaborator Author

laurentsimon commented Feb 19, 2022

update: I've separated the part that prints compiler arguments (including filename) in a different job. This part works.

It is possible to store the entire binary in the logs, there's no limit size for logs it seems. But a rogue compiler would be able to patch some files and subvert the output too.

The more I think about it., the more I think that's maybe not part of the threat model... :/

For example, python allows running script pre-packaging: in this case we have to trust the script anyway. This seems to indicate we cannot provide guarantees unless the script are not out right malicious.

I think the only way for additional protection here is to "sandbox" the compilation ourselves and limit fs, network access in order to share only a thin interface with the compiler (docker container would be an amazing start). This way we'd have more confidence in the hash we output in the provenance.

I wonder if we can use a dockerfile action for it (https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action) it does not seem to be the best option because it still allows printing messages, etc

@laurentsimon
Copy link
Collaborator Author

Fyi, I tested uploading an artifact from another job, and uploading a second time still works. So we do need the other hash and we cannot rely on immutability of the uploaded artifacts.

@asraa
Copy link
Owner

asraa commented Feb 22, 2022

So we do need the other hash and we cannot rely on immutability of the uploaded artifacts.

Similar to this (and I'll review the PR in case it fits easily), can we achieve a "sandbox" around the binary upload by deleting any upload with the binary name that comes from the dry-run?

@laurentsimon
Copy link
Collaborator Author

yes, I think that will be covered in my last PR: I'm going to move the upload to the last step where we upload the provenance. (slipped thru the cracks last Friday) Glad that you caught it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants