Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add canonicalized RekorBundle to the Rekor proto #261

Open
lkatalin opened this issue Mar 19, 2024 · 14 comments
Open

Add canonicalized RekorBundle to the Rekor proto #261

lkatalin opened this issue Mar 19, 2024 · 14 comments

Comments

@lkatalin
Copy link

lkatalin commented Mar 19, 2024

Based on the discussion about verifying the SET, I'm opening this issue to add a canonicalized RekorBundle to the TLE, to be used by Rekor in its CLI output. The purpose is to have a way to easily get the SET's corresponding payload from the rekor-cli in order to manually and auditably verify an SET, rather than having to construct the payload from bits and pieces of output, which is very cumbersome.

I've had some confusion around the "Sigstore bundle" vs. the "Rekor bundle" so restating my understanding: It looks like the Sigstore bundle VerificationMaterial has an embedded TransparencyLogEntry, inside which would be the place for the Rekor bundle. I know the Rekor bundle is what is signed by the Rekor public key to produce the SET, which is part of the verification material.

Possibly related issues:
#11
#116

@kommendorkapten
Copy link
Member

kommendorkapten commented Mar 19, 2024

I think this is the field you are looking for @lkatalin

// Optional. The canonicalized transparency log entry, used to

@lkatalin
Copy link
Author

@kommendorkapten That one is the canonicalized body, but there's currently no field that adds in the integratedTime, logID, and logIndex. Effectively a person has to do a lot of bash to get the payload that corresponds to the SET.

@woodruffw
Copy link
Member

@kommendorkapten That one is the canonicalized body, but there's currently no field that adds in the integratedTime, logID, and logIndex. Effectively a person has to do a lot of bash to get the payload that corresponds to the SET.

I might be missing something, but aren't those defined in the TransparencyLogEntry entry as well? log_id is field 2, for example. Those are the fields we use when verifying the SET in sigstore-python.

@lkatalin
Copy link
Author

@woodruffw The relevant fields are included, but stitching them together into a syntactically correct and canonicalized SET payload is burdensome to do manually for a user. A program can do it, but I think there is value in being able to easily see and audit that the SET is correct by retrieving the material that produced the signature. Otherwise why include the SET in user-facing output?

@haydentherapper
Copy link
Collaborator

We shouldn't duplicate these fields, especially the body due to the potential size. We have a comment already -

// The inclusion promise is calculated by Rekor. It's calculated as a
// signature over a canonical JSON serialization of the persisted entry, the
// log ID, log index and the integration timestamp.
- that specifies this should be the JSON canonicalization of those 4 fields. Is there something we could do to make that more clear?

I believe the issue is that the JSON field names differ (see https://github.com/sigstore/cosign/blob/main/pkg/cosign/bundle/rekor.go#L28-L33 as an example), so you have to know the mapping between protobuf-spec field and SET JSON field, correct?

@lkatalin
Copy link
Author

We shouldn't duplicate these fields, especially the body due to the potential size.

That's a fair consideration. Would it make sense then to just have one field output the entire canonicalized Rekor Bundle, instead of separate fields for the body, integratedTime, etc.?

There are a few reasons why constructing the Rekor Bundle manually was fiddly (and some may relate to the protobuf/JSON mapping, I am not sure) - including needing the correct order of fields (canonicalization), the correct capitalization (logID vs logId), handling newlines, getting the correct log ID value (which is not present in the TLE output, only the JSON output), and some values needing quotations marks where others don't. But the more fundamental problem IMO is that people have to reconstruct this value at all instead of it being readily available in the output.

I don't think Sigstore users should be asked to delve into the code and comments in order to verify the SET. Sigstore's security guarantees should be auditable and as easy to understand as possible. Each step in Sigstore's verification process (from the artifact all the way up the chain of trust to the TUF root) should be able to be replicated by the user. I believe it should be as hassle-free as possible to demonstrate the value of Sigstore. I'm working on some documentation for the entire verification chain, but I think people will get lost on steps like this.

@haydentherapper
Copy link
Collaborator

I addressed each point below, but I think I understand better now why this is needed. The issue is the lack of a standardized struct that represents the SET. I don't think we need to populate it in the bundle that we distribute to avoid the duplicated metadata, but we could define this message in the Rekor proto so that when users need to verify the SET, they simply need to populate this message using what's in the TransparencyLogEntry message without having to worry about the contents of the JSON message (still would require canonicalization though).

What I would propose is removing the RekorBundle field from the TLE message, but still defining the RekorBundle message. Does this sound good to you?

including needing the correct order of fields (canonicalization)

Order pre-canonicalization shouldn't matter. Using a standard canonicalization strategy should produce the same result across language implementations. This of course isn't easy to do via the CLI, but I wouldn't expect sigstore verification done via shell scripting, though you did run into this when trying to do this via bash.

the correct capitalization (logID vs logId)

Fair point, though this seems like something that should be handled via canonicalization?

handling newlines

Does canonicalization take care of this?

getting the correct log ID value (which is not present in the TLE output, only the JSON output)

I believe this is the difference between a hex encoding vs a base64 encoding? This did surprise me though. @woodruffw did you ever run into this?

and some values needing quotations marks where others don't

I assume this would be due to strings vs ints. A struct defining types would help.

@lkatalin
Copy link
Author

lkatalin commented Mar 21, 2024

Thanks @haydentherapper . I'm realizing that my change proposal may be more unorthodox than I expected.

This of course isn't easy to do via the CLI, but I wouldn't expect sigstore verification done via shell scripting

I want to make a case for why it should be possible to do this on the CLI, which is the motivation for this proposal. Users of Sigstore are interested in knowing what security guarantees they are getting, but they are not necessarily going to be developers. I think the ability to check Sigstore's (Rekor's) output on the CLI with tools that people may already be familiar with (such as openssl) and be able to follow and understand the chain of trust to know that Sigstore is "doing things correctly" without having to look through the code would be a big win for Sigstore. I have gotten questions around what Sigstore is actually checking and how everything is verified, and have found that these steps should be clearer to users. This is not meant to be something that people do every time they use Sigstore for verification - but having the data be readily available in an understandable form would allow easier auditability with tools that don't come from the Sigstore ecosystem, which would increase understanding and trust in the Sigstore tools.

So, I think first we would need to agree on whether this auditability is an outcome worth pursuing, and then work out the details on how to do it.

The issue is the lack of a standardized struct that represents the SET

Yeah, or I guess more specifically lack of a way for a user to introspect what the canonicalized signed material was to produce the SET (in order to verify it manually).

What I would propose is removing the RekorBundle field from the TLE message, but still defining the RekorBundle message.

Defining it would definitely be a step in the right direction. How would users access this if not in the TLE message? Could there be another rekor-cli output flag such as --format rekor-bundle that would emit the Rekor bundle to the user?

I believe this is the difference between a hex encoding vs a base64 encoding? This did surprise me though

It surprised me as well. I will paste the example I added in sigstore/rekor#1943 . These outputs are for the same artifact entry (the Rekor bundle requires the one from JSON):

--format tle output:

"logId": {
    "keyId": "wNI9atQGlz+VWfO6LRygH4QUfY/8W4RFwiT5i5WRgB0="
  },

--format json output:

"LogID": "c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d"

Just as FYI I created an issue re: the discrepancies between --format tle and --format json for the rekor-cli here sigstore/rekor#2052

@woodruffw
Copy link
Member

I believe this is the difference between a hex encoding vs a base64 encoding? This did surprise me though. @woodruffw did you ever run into this?

Hmm, I don't think we hit this exact case, but we ran into similar confusions when reconstructing the signed-over body for Fulcio SCT verification. I believe @segiddins has also hit a few variants of this while working on a Ruby implementation.

IMO it'd be ideal if Sigstore (and its subcomponents) unified on a single JSON-safe bytes representation over the long term. I also agree with the larger canonicalization point you've made: part of the security model for verifying inclusion statements (whether proofs or promises) is reconstructing the canonicalized body rather than accepting it as an opaque input (since, as an opaque input, it can be made inconsistent with other materials, as seen in CVE-2022-36056).


I think the ability to check Sigstore's (Rekor's) output on the CLI with tools that people may already be familiar with (such as openssl) and be able to follow and understand the chain of trust to know that Sigstore is "doing things correctly" without having to look through the code would be a big win for Sigstore.

I agree that there's value (indispensible value!) in independent verification, but I want to push back a bit on this approach: IMO tight coupling of components makes it easier to define misuse-resistant interfaces, which are particularly important when combining multiple cryptographic components like Sigstore does.

In other words: there's value in having multiple ways for users to verify e.g. the SET component of a Sigstore bundle, but IMO optimizing for those kind of "breakouts" vs. the coherency of an overall Sigstore verification process makes the latter harder to design misuse-resistant interfaces for (since every component is now introspectible, rather than the overall system having a "pass"/"fail" interface as cryptographic APIs should have).

My proposal here would be to increase auditability not by making each subcomponent easier to devolve, but instead by:

  • Increasing use (and depth) of the conformance suite by Sigstore implementations, allowing end users to assert additional confidence in their implementation of choice by comparing it to others;
  • Improving public-facing documentation about how Sigstore works, a la https://certificate.transparency.dev/
  • Improving the standards-side definition of Sigstore, including the currently incomplete Sigstore client specification 😅

@haydentherapper
Copy link
Collaborator

I don't see a difference between inspecting the steps in code and running a bash command. In either case, it requires a deep understanding of the cryptographic structures and the protocol. We also are trying to move away from SETs, replacing with inclusion proofs + checkpoints + witness cosignatures and signed timestamps, which would be far more complex to verify by hand (though it's been done! https://edu.chainguard.dev/open-source/sigstore/cosign/cosign-manual-way/#get-transparent).

One gap is that the rekor CLI hasn't received the same attention as of late as other Sigstore clients. I don't think we can integrate it into the conformance test suite since that is testing the full end to end flow. But we should make sure that its output can be used with clients that either support detached verification material or could be directly added to a bundle.

Defining it would definitely be a step in the right direction. How would users access this if not in the TLE message? Could there be another rekor-cli output flag such as --format rekor-bundle that would emit the Rekor bundle to the user?

It doesn't have to be in the TLE message, it can be its own message. I wouldn't expect it to be the output of any tool, just to help aid clients when constructing the SET themselves, who now only need to know the mapping between the fields in the TLE and the SET structure which will define the JSON fields.

@segiddins
Copy link
Contributor

I'm not sure if the discussion belongs here, but I think one of the things that would make this easier is representing the inclusion proof checkpoint as structure data instead of a single string

@lkatalin
Copy link
Author

Thanks for the discussion, everyone - I will have time to follow up on this and update the Rekor proto PR in early April.

@haydentherapper
Copy link
Collaborator

I'm not sure if the discussion belongs here, but I think one of the things that would make this easier is representing the inclusion proof checkpoint as structure data instead of a single string

@segiddins Unfortunately this is out of our control, we are following https://github.com/C2SP/C2SP/blob/main/tlog-checkpoint.md which represents a checkpoint as a signed note which is just a string.

@lkatalin
Copy link
Author

lkatalin commented Apr 2, 2024

Hi all, I've updated the PR #262 based on this conversation to add only a RekorBundle message that is separate from the TLE message. Please take a look when you get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants