Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spike: uki image revocation and rollback #2429

Open
Tracked by #1792
mudler opened this issue Apr 4, 2024 · 30 comments
Open
Tracked by #1792

spike: uki image revocation and rollback #2429

mudler opened this issue Apr 4, 2024 · 30 comments

Comments

@mudler
Copy link
Member

mudler commented Apr 4, 2024

Is your feature request related to a problem? Please describe.
If a vulnerability is found in older images, we might want to disable the access to the encrypted portion of the disk for certain images. Similarly, if encryption keys are leaked, we would like to have a mechanism that allows or either to update the system to use a new key (by for instance using the old keys to update to new ones), or just invalidate the portion of the stack that is responsible to unencrypt the disk.

Describe the solution you'd like
A way to use old keys to generate an upgrade image that installs the new ones. Alternatively a mechanism that allows with an upgrade image to invalidate older images.

Describe alternatives you've considered

Additional context

@mudler mudler added spike uki triage Add this label to issues that should be triaged and prioretized in the next planning call labels Apr 4, 2024
@Itxaka
Copy link
Member

Itxaka commented Apr 4, 2024

SBAT was made for this I think: https://github.com/rhboot/shim/blob/main/SBAT.md

@mudler
Copy link
Member Author

mudler commented Apr 4, 2024

SBAT was made for this I think: https://github.com/rhboot/shim/blob/main/SBAT.md

by reading it looks like https://0pointer.de/blog/brave-new-trusted-boot-world.html took some inspiration from, at least looks like similarly a concept of an incremental counter is required. What is not clear is how that ties to the sealing process.

From the Lennart's article he mention that when sealing against a TPM PCR Bank it is not only the PCR policy, but also the counter is used - which I'm afraid it will make it incompatible with the current installs (I also can't find a reference on systemd-* about that, so looks like something that can't be - yet - used?)

@jimmykarily jimmykarily removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 15, 2024
@ci-robbot ci-robbot added the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 15, 2024
@ci-robbot
Copy link
Collaborator

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for opening this issue. It seems that your issue, titled "spike: uki image revocation and rollback" (#2429) in the "kairos" repository owned by "kairos-io", is not completely clear regarding the relationship with the sealing process and the compatibility with current installations.

As you provided some additional context and potential alternatives like SBAT and the concept of using TPM counters, we can better understand the problem you're trying to address. However, more details are needed to meet the project's requirements. Please provide steps to reproduce (if it's a bug), and mention the versions of the relevant artifacts being used.

Once you provide the necessary information, we will be able to classify your issue more accurately and assign the appropriate label. Thank you for your patience and understanding, and please remember that I'm just a bot.

Best regards!

@Itxaka
Copy link
Member

Itxaka commented Apr 15, 2024

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for opening this issue. It seems that your issue, titled "spike: uki image revocation and rollback" (#2429) in the "kairos" repository owned by "kairos-io", is not completely clear regarding the relationship with the sealing process and the compatibility with current installations.

As you provided some additional context and potential alternatives like SBAT and the concept of using TPM counters, we can better understand the problem you're trying to address. However, more details are needed to meet the project's requirements. Please provide steps to reproduce (if it's a bug), and mention the versions of the relevant artifacts being used.

Once you provide the necessary information, we will be able to classify your issue more accurately and assign the appropriate label. Thank you for your patience and understanding, and please remember that I'm just a bot.

Best regards!

@ci-robbot hello there, how do I get the sbat in python?

@ci-robbot
Copy link
Collaborator

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for the response.

To get the sbat in Python, you need to follow these steps:

  1. Install systemd-cryptsetup and python3-cryptsetup

    If you are using the pip package manager, you can install systemd-python-libs with the following command:

    pip install systemd-python-libs
    

    Once you have systemd-python-libs installed, you can install sbat using pip:

    pip install sbat
    
  2. Use the sbat Python package in your code by importing it as follows:

    import sbat

Remember that this solution is valid as of the current state of the artifacts. Make sure to specify the versions of the relevant artifacts in your issue as required by the project's guidelines.

If you have any further questions or need more clarification, feel free to ask.

Best regards!


This conversation has been automatically audited by bot. The audit has identified that the information provided is sufficient to meet the project's requirements. Therefore, the issue can be labeled appropriately. The required details include a description of the issue, steps to reproduce (if it's a bug), and the versions of the relevant artifacts being used.

Since the issue is clear and meets the requirements, I will assign the following labels to the issue: [question, triage].

@jimmykarily jimmykarily removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 17, 2024
@jimmykarily jimmykarily self-assigned this Apr 17, 2024
@jimmykarily
Copy link
Contributor

sbat is only understood and used by the shim (https://uapi-group.org/specifications/specs/unified_kernel_image/#uki-components). We don't use the shim so we can't rely on sbat for revocation.

@jimmykarily
Copy link
Contributor

jimmykarily commented Apr 17, 2024

2 things:

  • What's the worse that can happen if we just rely on dbx to blacklist images (revocation by image hash). Do we expect this to be so common that it will fill up the dbx available space? In other words, when should a Kairos image be blacklisted? It's obvious that older versions of Kairos will at some point have CVEs on them (we don't backport package updates to older versions). Does that mean that every previous version of Kairos should be blacklisted?
  • This whole sbat flow was invented for systems where the original certs should be kept in place (e.g. the Microsoft one). We enroll our own keys. Blocking older images can happen by simply rotating the keys in the BIOS. Maybe all we need to do is cleanup the db from old keys and enroll a new one. It should be possible to manipulate the db from within a trusted OS. We need to try this out. Update: documented here

@Itxaka
Copy link
Member

Itxaka commented Apr 17, 2024

2 things:

* What's the worse that can happen if we just rely on `dbx` to blacklist images ([revocation by image hash](https://github.com/rhboot/shim/blob/main/SBAT.md?rgh-link-date=2024-04-17T11%3A59%3A56Z)). Do we expect this to be so common that it will fill up the dbx available space? In other words, when should a Kairos image be blacklisted? It's obvious that older versions of Kairos will at some point have CVEs on them (we don't backport package updates to older versions). Does that mean that every previous version of Kairos should be blacklisted?

* This whole sbat flow was invented for systems where the original certs should be kept in place (e.g. the Microsoft one). We enroll our own keys. Blocking older images can happen by simply rotating the keys in the BIOS. Maybe all we need to do is cleanup the db from old keys and enroll a new one. It should be possible to manipulate the db from within a trusted OS. We need to try this out. **Update:** [documented here](https://wiki.archlinux.org/title/Unified_Extensible_Firmware_Interface/Secure_Boot#Updating_keys)

dbx revocation would be good enough I guess? But it means we need to generate the dbx or a way of updating the dbx (AFAIK you can update that one from userspace somehow?) so people can generate those from their own built efi files.

What happens when you dont have access to those efi files anymore? How do you generate the hash for it?

  • Custom v1 is built and keys inserted
  • Custom v2 built, some hardware upgraded some not
  • Custom v3 built, some hardware upgraded some not

Now it happens that v1 has a cve and you want to block it. You release Custom v4 with the dbx updated.

  • How do you get the image hash to add it to dbx? what if you dont have the install media anymore or can reproduce it 1to1?
  • How do you provide the dbx to all the machines?
  • do you need to upgrade all of them to v4?
  • what if you cant upgrade some of the v2 or v3 ones?

I mean, sounds good to me to use the actual mechanism in place in the firmware for this but it entails a bit of laying down the exact supported way of doing this.

Im my machine I had updates to the dbx provided directyl from https://github.com/fwupd/fwupd (https://fwupd.org/) so maybe its possible to do this, ship the daemon and have the customers provide their own update server with dbx files?

for other usecases (no internet) maybe just an upgrade to a new version is good enough. Or the fwupdate can be used to also update it via local files somehow?

@Itxaka
Copy link
Member

Itxaka commented Apr 17, 2024

Yep, seems to be possible:

https://github.com/fwupd/fwupd/tree/main/plugins/uefi-dbx

The org.linuxfoundation.dbx.*.firmware components will match against a hash of the system PK. The latest cabinet archive can also be installed into the vendor-firmware remote found in /usr/share/fwupd/remotes.d/vendor/firmware/ which allows the version-fixup to work even when offline -- although using the LVFS source is recommended for most users```

@jimmykarily
Copy link
Contributor

I think I have a preference for my second suggestion (keys rotation) which blacklists every past image by enrolling a new key. Keys can also be appended in dbx, which makes me wonder what happens if a key is both in db and dbx. I guess dbx wins and the key is rejected (?). Let us play a bit manually in qemu before we start bricking devices :D.

@jimmykarily
Copy link
Contributor

First approach to use sbctl here: Foxboron/sbctl#296

@jimmykarily
Copy link
Contributor

Also suggested some preparation work here: Foxboron/sbctl#297

@jimmykarily
Copy link
Contributor

jimmykarily commented May 15, 2024

In order to get a cert or an image blacklisted in dbx, in thousands of machines with no physical access, we'll need to run commands in user mode (e.g. using the upgrade controller). With some experimentation in qemu I verified that it's possible to enroll things in dbx as long as the binary used is signed with a valid (enrolled) key.

NOTE: In qemu one has to use a 4m version of the secure boot firmware otherwise it's not possible to enroll things in dbx. This are the commands that worked for me (with a 4m firmware):

[kairos@fedora ~]$ efi-readvar  | grep dbx
Variable dbx has no entries
[kairos@fedora ~]$ sudo efi-updatevar -f dbx.auth dbx
[kairos@fedora ~]$ efi-readvar  | grep dbx
Variable dbx, length 12872
dbx: List 0, type SHA256
dbx: List 1, type SHA256

the dbx.auth file was an esl file ( dbx.esl in the example below, exported from bios) which was signed with a command:

sign-efi-sig-list -c keys/KEK.pem -k keys/KEK.key dbx ~/dbx.esl dbx.auth

We now need to figure out the commands to use in order to create a valid dbx.auth file for enrollment.
Then we need to test the whole flow out:

  • Boot old image
  • Enroll new key (KEK? db?)
  • Put the current key (or image digest) in the dbx (after this, reboot to the same image won't be possible)
  • upgrade to a new image signed with the new key
  • Reboot to the new image

If this works, it could be a revocation solution for KEK and db keys. If the PK key is compromised, I'm not sure the same process would work. I think it's possible to replace the PK key (some references here) but this would probably render the KEK and db keys invalid (unless they are replaced too).

Anyway, one thing at a time, let's see this flow works for at least db keys.

This is my TODO list:

Step 1:

- [✓] Build a fedora uki image
- [✓] Enroll the keys in qemu
- [~] From within the OS, try to add the current key in dbx
- [ ] Reboot and check that the system is not bootable anymore


Step 2:

- Repeat the above to add the current key in dbx
- Generate another set of keys
- Enroll the new key from withing the OS (before rebooting)
- Upgrade to another image signed with the new key (before rebooting)
- reboot and see if the new image boots

Step 3:

Try the above step but using the upgrade controller to do that in one step.

@jimmykarily
Copy link
Contributor

Another thing to notice, in qemu, after enrolling once, it's not possible to enroll again (not even with -a):

[root@fedora kairos]# efi-readvar  | grep dbx
Variable dbx has no entries
[root@fedora kairos]# ls
dbx.auth
[root@fedora kairos]# efi-updatevar -f dbx.auth dbx
[root@fedora kairos]# efi-readvar  | grep dbx
Variable dbx, length 12872
dbx: List 0, type SHA256
dbx: List 1, type SHA256
[root@fedora kairos]# efi-updatevar -f dbx.auth dbx
Failed to update dbx: Operation not permitted
[root@fedora kairos]# efi-updatevar -a -f dbx.auth dbx
Failed to update dbx: Operation not permitted
[root@fedora kairos]# chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*
[root@fedora kairos]# efi-updatevar -a -f dbx.auth dbx
Failed to update dbx: Invalid argument

I'm not sure why. I can redo it if I reset the OVMF_VARS file:

sudo cp /usr/share/edk2/x64/OVMF_VARS.4m.fd /var/lib/libvirt/qemu/nvram/fedora_VARS.fd

@jimmykarily
Copy link
Contributor

jimmykarily commented May 16, 2024

In qemu I get strange results. I tried the following sequence of commands with all three PK, KEK and db in qemu with different results:

$ export UUID=`uuidgen`
$ cert-to-efi-sig-list -g "Kairos-$UUID" keys/PK.pem PK-dbx.esl
$ sign-efi-sig-list -c keys/PK.pem -k keys/PK.key dbx PK-dbx.esl PK-dbx.auth
$ scp PK-dbx.auth kairos@192.168.122.56: # copy to the VM

# Inside the VM now:
[kairos@fedora ~]$ sudo efi-updatevar -f PK-dbx.auth dbx
[kairos@fedora ~]$ efi-readvar | grep dbx
Variable dbx, length 819
dbx: List 0, type X509

In the case of PK the enrollment succeeds and even go-uefi lists the cert from dbx (using a modified sbctl)

In the case of KEK, it enrolls but go-uefi panics when trying to list dbx entries
In the case of db, it doesn't even enroll, efi-updatevar gives an Invalid argument error

Even when it seems to successfully enroll the certificate in dbx (e.g. in the PK case), when I reboot the VM it still boots the livecd with secureboot enabled 🤷 ? Isn't the livecd efi signed with the same key?

@jimmykarily
Copy link
Contributor

I even added the certificate's signature hash to ensure the one in dbx is the PK one:

[root@fedora kairos]# ./sbctl list-enrolled-keys
PK:
  kairos - 927dc1119f35d2ea18e276b2d7d46492b42d00c2a4af3b75fbda2c797d1cb575
KEK:
  kairos - 3e8ed3a5bcd2464c7025eb5ef61945d189df1c30d58c7552710265e20340331b
  Microsoft Corporation Third Party Marketplace Root - dd15c4fa0e858caa2ca4ee702489771701b1c7342a93bd3d92078b6b404661e4
  ASUS KEK Certificate - 880c8b3dae562f6acaaa61a18d52515b310290e88b40f273b3bed8f7e196cbfe
  Microsoft Corporation Third Party Marketplace Root - dd15c4fa0e858caa2ca4ee702489771701b1c7342a93bd3d92078b6b404661e4
DB:
  kairos - 9348b62f81e22c21ad6e1016b6020d04a02d56012d9ae2f89e69c73fdb762b6f
  Microsoft Corporation Third Party Marketplace Root - 7334c9a3c20f73863264af84c8d3b9aa9bc859fd61005fb0d368d80a85f18601
  Microsoft Root Certificate Authority 2010 - 859e73cde47d25e30ec2ecfd6126e2ba279a364b52f9ae3998ee363a97a3a87b
  ASUS UEFI SW Key Certificate - 62b939869ed57f9d7fb519066b408b203bf21bab47db961a1a8cbcf131f4347b
  Microsoft Root Certificate Authority 2010 - 859e73cde47d25e30ec2ecfd6126e2ba279a364b52f9ae3998ee363a97a3a87b
  Microsoft Corporation Third Party Marketplace Root - 7334c9a3c20f73863264af84c8d3b9aa9bc859fd61005fb0d368d80a85f18601
  Microsoft Root Certificate Authority 2010 - f386076ce922977f1252bd9dae9c4508a7e83dfcbc8b9f166df6d1adbe06c1db
dbx:
  kairos - 927dc1119f35d2ea18e276b2d7d46492b42d00c2a4af3b75fbda2c797d1cb575

Maybe dbx is poorly implemented in the qemu firmware (if at all)? I don't dare try such things on my Asus :D

@jimmykarily
Copy link
Contributor

jimmykarily commented May 16, 2024

In the case of db, it doesn't even enroll, efi-updatevar gives an Invalid argument error

I correct myself. It works fine for db (the problem was that in qemu I cannot run the command a second time) and actually that's the only one that prevents the image from booting again. It seems that the "chain of trust" is somehow broken.

I think the way we sign the PK, KEK and db files is wrong. Each one should be signing the other. After all, in the end, we only sign the efi file with the db one.

I tried to fix the signing here: https://github.com/kairos-io/enki/compare/fix-signing?expand=1 but it doesn't seem to make a difference.

This is what I see in qemu when the db cert is put in dbx:

image

(not a very clear message but I assume the verification didn't pass)

@jimmykarily
Copy link
Contributor

jimmykarily commented May 16, 2024

The question is why did the UEFI firmwares so far allowed us to enroll these badly signed keys? I think the answer is one of:

  • In setup mode, you are allowed to enroll whatever you want
  • Some firmwares don't validate this (bad implementations)
  • The order of enrollment matters. If db is enrolled when no KEK exists, it's always allowed. Same thing for KEK when no PK exists.

I think the last one is true. It's supported by the fact that systemd-boot auto enrolls in this order: db -> dbx -> KEK -> PK.

and also by the comment in this link: We set the Secure Boot variable – Db, KEK and PK in reverse order so we don't have to sign the UEFI bin files.

If this explains why our badly signed certs were accepted for enrollement, it doesn't explain why my attempted fix to sign them properly, doesn't fix the "chain of trust" (see previous comment).

@jimmykarily
Copy link
Contributor

jimmykarily commented May 16, 2024

I tried on my Asus PN64 just to check if it's only happening on qemu. I was able to enroll the badly signed certificates (not signing each other) in the order: PK -> KEK -> db and then I managed to boot the livecd just fine.
So the check didn't take place neither at manual enrollment time nor at boot time.
Also to test it even further, I rebooted to the UEFI firmware and deleted just the KEK database. The livecd still boots and reports:

[kairos@fedora ~]$ sbctl list-enrolled-keys
PK:
  kairos-PK
KEK:
DB:
  kairos-db
  Microsoft Corporation Third Party Marketplace Root
  Microsoft Root Certificate Authority 2010
  ASUS UEFI SW Key Certificate
  Microsoft Root Certificate Authority 2010
  Microsoft Corporation Third Party Marketplace Root
  Microsoft Root Certificate Authority 2010
dbx:

Should this happen? This "chain of trust" seems to be a very "loose" chain.

(I made a change in enki genkey to give unique CN fields to each cert by appending the "type")

@jimmykarily
Copy link
Contributor

From the UEFI spec:

## page 1422:

While no Platform Key is enrolled, the SetupMode variable shall be equal to 1. While SetupMode == 1, the platform
firmware shall not require authentication in order to modify the Platform Key, Key Enrollment Key, OsRecoveryOrder,
OsRecovery####, and image security databases.
After the Platform Key is enrolled, the SetupMode variable shall be equal to 0. While SetupMode == 0, the platform
firmware shall require authentication in order to modify the Platform Key, Key Enrollment Key, OsRecoveryOrder,
OsRecovery####, and image security databases.

## page 222:

If the variable SetupMode==1, and the variable is a secure boot policy variable, then the firmware implemen-
tation shall consider the checks in the following steps 4 and 5 to have passed, and proceed with updating the
variable value as outlined below.
...
5. If the variable is the global PK variable or the global KEK variable, verify that the signature has been made with
the current Platform Key.
• If the variable is the “db”, “dbt”, “dbr”, or “dbx” variable mentioned in step 3, verify that the signer’s
certificate chains to a certificate in the Key Exchange Key database (or that the signature was made with
the current Platform Key).
...
The driver shall update the value of the variable only if all of these checks pass. If any of the checks fails, firmware
must return EFI_SECURITY_VIOLATION

The way I understand this is that even in setup mode, if you try to enroll something in db (or dbx) that is not signed by a cert in the KEK or the PK itself, the firmware should refuse to add it. This is not what my ASUS PN64 does 🤷 .

@jimmykarily
Copy link
Contributor

and then on page 1424:

The authenticated UEFI variable that stores the key exchange keys (KEKs) can always be read but only be written if:
• The platform is in user mode and the provided variable data is signed with the current PKpriv; or if
• The platform is in setup mode (in this case the variable can be written without a signature validation, but the
SetVariable() call needs to be formatted in accordance with the procedure for authenticated variables in Using
the EFI_VARIABLE_AUTHENTICATION_3 descriptor)

:D

@jimmykarily
Copy link
Contributor

jimmykarily commented May 17, 2024

I did the following:

  • Created a set of keys with enki genkey kairos
  • I built a uki iso with these keys:
docker run --rm -v $PWD/unpacked:/unpacked -v $PWD/build:/result -v /home/dimitris/workspace/kairos/enki/keys/:/keys quay.io/kairos/osbuilder-tools:v0.201.0  build-uki dir:/unpacked --output-dir /result --keys /keys --output-type container --boot-branding "KairosUKI"
  • I installed Kairos using this ISO (entered setup mode to allow the keys to auto enroll)
  • I created a new keys directory (backed-up the old one). I copied all but the db files.
  • Within the new directory, I created a new db signature (arch docs):
$ uuidgen --random > GUID.txt
$ openssl req -newkey rsa:4096 -nodes -keyout db.key -new -x509 -sha256 -days 3650 -subj "/CN=NewKairosDB/" -out db.crt
$ openssl x509 -outform DER -in db.crt -out db.cer
$ cert-to-efi-sig-list -g "$(< GUID.txt)" db.crt db.esl
$ sign-efi-sig-list -g "$(< GUID.txt)" -k KEK.key -c KEK.pem db db.esl db.auth
  • I created the additional key formats needed by build-uki command (command from my bash history, not sure which ones I actually needed, build-uki will complain if one is missing):
$ openssl x509 -outform der -in db.pem -out db.der
$ openssl x509 -outform der -in db.crt -out db.der
$ openssl x509 -inform DER -outform PEM -in db.der -out db.pem
$ cp blacklisted-keys/tpm2-pcr-private.pem keys/
  • I created a new upgrade image signed by the new db:
docker run --rm -v $PWD/unpacked:/unpacked -v $PWD/build:/result -v /home/dimitris/workspace/kairos/enki/keys/:/keys quay.io/kairos/osbuilder-tools:v0.201.0  build-uki dir:/unpacked --output-dir /result --keys /keys --output-type container --boot-branding "KairosNewUKI"
  • I enrolled the new db (scp the file in the VM):
$ sudo chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*
$ efi-updatevar -f db.auth db
  • created a dbx auth file to blacklist the old db (signed by the same old KEK):
$ export UUID=`uuidgen`
$ cert-to-efi-sig-list -g "Kairos-$UUID" keys/db.pem db-dbx.esl
$ sign-efi-sig-list -c keys/KEK.pem -k keys/KEK.key dbx db-dbx.esl db-dbx.auth
  • enrolled the old key in dbx (scp the file in the VM):
$ efi-updatevar -f db-dbx.auth dbx
  • performed an upgrade:
kairos-agent upgrade --source oci:myimage
  • rebooted

The result is that the fallback (passive) and recovery images are not bootable anymore proving that the old db is no longer accepted (because it's not even enrolled but it's blacklisted too, we shouldn't need to do both).

The active image is bootable but generates lots of errors and login is not possible. Maybe what @mudler suggested in a call is true. Maybe the decryption of the encrypted partitions didn't happen for some reason. Need to investigate more.

@jimmykarily
Copy link
Contributor

jimmykarily commented May 20, 2024

I hardcoded my ssh key in the image so that I can ssh after upgrade to collect logs.

full journalctl logs: journal.txt
immucore logs: immucore.txt

@jimmykarily
Copy link
Contributor

jimmykarily commented May 20, 2024

[root@fedora ~]# systemd-cryptenroll --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device=auto /dev/vda2
🔐 Please enter current passphrase for disk /dev/vda2: (no echo) 

tpm can't decrypt anymore?

@jimmykarily
Copy link
Contributor

jimmykarily commented May 20, 2024

tpm2_pcrread after first installation:

[root@fedora ~]# tpm2_pcrread 
  sha1:
  sha256:
    0 : 0x90984038936ED34A5D05EEB36A6A72E3E16A46C9085784160EC4FCEC7961518B
    1 : 0x72565823964D0360FEF4490FF94CF532DFBD89303656434020B382247684190B
    2 : 0x0C086A8BA21BF3CBEEE845770A92C2131A00E79A981B035CB65407E97F5A870C
    3 : 0x3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969
    4 : 0x86175562067E6CE608091D444046D6F0F2B049D018DB754D57846E85A31B3847
    5 : 0x42A7BF1BF8C4D25D164AB5E1804F7DD2E09A2AAAE18D62DEE88C765DE89BEB2C
    6 : 0x3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969
    7 : 0x184748238B13A14A6FCD5F802E3FF70781D91703B1FBC483C21D1C7CCC3BDE91
    8 : 0x0000000000000000000000000000000000000000000000000000000000000000
    9 : 0x12FCCDC20A7E5F225FD81D50D7AC40A48F3940805904766BF0CE1C0A1EB957DB
    10: 0x7BC4D09AC3F6CED6AECED6944A4493626ED569C91C105450064A366A761B2432
    11: 0xB31A6145CD016EFFDA1049ABD782C66AC4DF16946475EA0B9D49ED784330601D
    12: 0x0000000000000000000000000000000000000000000000000000000000000000
    13: 0x0000000000000000000000000000000000000000000000000000000000000000
    14: 0x0000000000000000000000000000000000000000000000000000000000000000
    15: 0xA4C81ADFCD35CBD55BCA843EB711EEA98AF30C733E3BBDC87FB7EF4A4C53B61B
    16: 0x0000000000000000000000000000000000000000000000000000000000000000
    17: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    18: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    19: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    20: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    21: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    22: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    23: 0x0000000000000000000000000000000000000000000000000000000000000000
  sha384:
  sha512:

same command after upgrading to the image signed by a different key:

[root@fedora ~]# tpm2_pcrread
  sha1:
  sha256:
    0 : 0x90984038936ED34A5D05EEB36A6A72E3E16A46C9085784160EC4FCEC7961518B
    1 : 0x72565823964D0360FEF4490FF94CF532DFBD89303656434020B382247684190B
    2 : 0x0C086A8BA21BF3CBEEE845770A92C2131A00E79A981B035CB65407E97F5A870C
    3 : 0x3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969
    4 : 0xE42043142BD876F72E2EE3D2ECF2E49D59E6DB57E6D9F70296073857B564ED04
    5 : 0x42A7BF1BF8C4D25D164AB5E1804F7DD2E09A2AAAE18D62DEE88C765DE89BEB2C
    6 : 0x3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969
    7 : 0x18917EE2A8CD1C729A9C852C63D5208EA7E0A69095168C91059687AE4C51E8D0
    8 : 0x0000000000000000000000000000000000000000000000000000000000000000
    9 : 0xECE6573D56F53472A366FE8B8D9122389DA1DEF8007D5D056E4C10AC8FD9AA1C
    10: 0x09B59702369F83C3476D1231CB9B0CBDD1C237AC5EE4BC80B535BC9B681A38BC
    11: 0xAF084D0FD893E9D86998A09B123A5C81D871B89C1B3E3FEF01FB4567443D45F4
    12: 0x0000000000000000000000000000000000000000000000000000000000000000
    13: 0x0000000000000000000000000000000000000000000000000000000000000000
    14: 0x0000000000000000000000000000000000000000000000000000000000000000
    15: 0xB8017089D4AEB6CF17314C9423A52A0844059A99DAE1AFAE423451688636C82D
    16: 0x0000000000000000000000000000000000000000000000000000000000000000
    17: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    18: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    19: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    20: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    21: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    22: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    23: 0x0000000000000000000000000000000000000000000000000000000000000000
  sha384:
  sha512:

enrolled keys (after upgrade):

[root@fedora ~]# sbctl list-enrolled-keys
PK:
  kairos-PK - 0d6527b0a83fea5e3c1a36d8a5b0ec1925fa79ea4f30e1f0b08c7bd6176f13c1
KEK:
  kairos-KEK - 0ccb82a0fb28cb0592f603d625e2e1b7a8b7c5f0075604b3f8804b9ba1e9dc04
  Microsoft Corporation Third Party Marketplace Root - dd15c4fa0e858caa2ca4ee702489771701b1c7342a93bd3d92078b6b404661e4
  ASUS KEK Certificate - 880c8b3dae562f6acaaa61a18d52515b310290e88b40f273b3bed8f7e196cbfe
  Microsoft Corporation Third Party Marketplace Root - dd15c4fa0e858caa2ca4ee702489771701b1c7342a93bd3d92078b6b404661e4
DB:
  New Kairos db key - ee9dce4cc8f780d5a9dc18217a30254b3b78a0814e5e0772c375445077f1c2d1
dbx:
  kairos-db - 630f66680e07858bf76d5c3d4ac820a24a958bd46263a55b7d227491db696718

(TODO: update the comment with the values after upgrade)

@Itxaka
Copy link
Member

Itxaka commented May 21, 2024

Tested without upgrade, just on a installed system and I seem to hit the same thing.

  • installed system with keys ITXAKA
  • everything ok, system boots, unlocks partitions
  • Add a new KEK key (kairos2)
  • cannot mount the encrypted partitions any longer
  • but system boots, so secureboot is ok and the keys are in there correctly
root@localhost:/tmp/sbctl# sbverify --list /efi/EFI/kairos/active.efi 
signature 1
image signature issuers:
 - /CN=ITXAKA
image signature certificates:
 - subject: /CN=ITXAKA
   issuer:  /CN=ITXAKA
root@localhost:/tmp/sbctl# ./sbctl list-enrolled-keys
DB:
  ITXAKA
  Microsoft Corporation Third Party Marketplace Root
  Microsoft Root Certificate Authority 2010
PK:
  ITXAKA
KEK:
  ITXAKA
  Microsoft Corporation Third Party Marketplace Root
  Kairos2

If I remove the Kairos2 key then it works again.
Same thing if I extend the DB.

There seems to be a connection between the Secureboot certs and the measurements somehow that we are not seeing.

EDIT: This is on Ubuntu 24.04

@Itxaka
Copy link
Member

Itxaka commented May 21, 2024

Opened a ticket upstream on systemd to see if they can clarify systemd/systemd#32946

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

Updating here in case the other ticket goes nowhere.

there is 2 ways of binding to a PCR when enrolling a luks partition/disk

  • tpm-pcrs -> binds to a SINGLE set of measurements, useful for things like firmware code, machine-id and things that should never change
  • public-key-pcrs -> binds to a POLICY that covers measurements signed under that policy, useful for things that can change like the system UKI (you want to upgrade, dont ya?)

Docs are kind fo confusing in here as they seem to be mutually exclusive, but are not. When we bind to the public-key-pcr 11, cryptenroll silently would also enroll to the PCR7, single measurement (Secureboot state and certs).
That means that it would bind to static PCR7+policy PCR11. This works great until we dont want it because we want to expand the certs to be able to blacklist stuff or enroll new ones.

The idea would be to set tpm-pcrs to an empty value, so we only bind to the policy measurements. BUT there is a bug in cryptenroll that if you set the tpm-pcrs to empty it will try to check the bank to get the hash and fails becuase it does not take into consideration that you migth not want to bind to any tpm-pcrs which is linked in the post above.

So it needs to be fixed upstream so we can skip binding to pcr7 automatically.

There is a workaround for this, and its by skipping checking the tpm directly and using the tpm public SRK key to calculate the values. From systemd 255 and upwards, the tpm key is automaticaly extracted on boot and can be used to calculate the values to lock the luks device without ever going to the tpm directly by using --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public

This is now available in kcrypt v0.11.0 but it makes the minimum systemd version 255 (Ubuntu 24.04 and Fedora 40) and works perfectly.

What does this means?

  • On 3.0.x we are automatically binding to PCR7(fixed)+PCR11(policy) so we wont support upgrading the certs or rotating them.
  • On 3.1.x we would be able to bind to only PCR11(policy) so we support rotating and blacklisting
  • We need an upgrade path from 3.0.x to 3.1.x in which we enroll the keys again without binding to PCR7 as long as we are on systemd 255.
  • We cannot do it directly on the upgrade as we boot with systemd <255 so the tpm2-srk-public-key.tpm2b_public is not there
  • Currently its not supported to upgrade the luks slot without a password or other entry, you cannot use the TPM2 measurements to update the unlock values (blocker?) so I dont see how we would update the signature in the luks header to not bind to PCR7 after locking it. https://www.freedesktop.org/software/systemd/man/latest/systemd-cryptenroll.html#Limitations

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

Talos has some utils to add and manage luks keys, maybe its possible to unlock and add a new key via that?

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

CRYPTSETUP CAN ADD NEW KEYS!!!

cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2

That seems to use the tpm2 to update the keys, not asking for a password or anything!! So we could probably leverage that to sync a new tpm key if needed, even if its a manual action, we could do the following in the ugprade

  • upgrade uki file
  • extract measurements from the newer uki file (if they were signed with a different key for example)
  • cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 and set a manual random password
  • use systemd-cryptenroll to enroll the new tpm values, wiping out the plaintext password at the same time like we do on kcrypt

And that may even work. Now if we were able to use the tpm token to update the same tpm token it would eb even better

@jimmykarily jimmykarily removed their assignment May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo 🖊
Development

No branches or pull requests

4 participants