-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v5.1.4][S3][Flash Encryption] Toggling VCC / GPIO-0 during boot BRICKS THE DEVICE! (invalid header: 0xce5209fc) (IDFGH-12807) #13783
Comments
This is a super serious issue! I can't ship a product that bricks itself! |
note: I can still recover the device using GPIO 0 to put it into DOWNLOAD mode, and reflashing firmware. edit: nevermind. I spoke too soon. I can reflash it using DOWNLOAD mode, but the device fails to boot with the same INVALID HEADER message. This device was previously working fine! Until I unplugged & replugged it! Flashing using USB Serial / JTAG:
When flashing using UART, it is mostly the same but but the console just logs 'xxxxxxxxx' and nothing else: Flashing using UART:
|
@igrr , appreciate your thoughts here. |
I want to be clear, I use host-generated flash encryption keys, so I can normally re-flash the encrypted device and I thought I could recover it, but when the device gets in this state, I can not re-flash it to recover. Its as if the flash encryption keys on the device got corrupted. |
So I've been trying to reproduce this for the past 45 minutes. Original repro scenario (~10 attempts):
did not repro (~150 attempts)
did not repro (~150 attempts)
This leaves a couple suspects:
USB-C to USB-B CableMy cable appears to be poorly made, so as you plug it in, VCC gets turned on and off a few times. Jiggling the cable causes the device to reset 3d printed enclosureThis may actually be related to the problem.
GPIO0My current theory is that some combination of toggling both VCC & GPIO 0 is causing the problem. Next StepsI will try the original repro scenario, but with a better enclosure that never holds down GPIO 0 |
Wow no way!!!! I wanted to check if somehow flash encryption got disabled, so I flashed a plaintext binary, AND IT BOOTED!! I know this sounds incredible, but I am 100% sure flash encryption was enabled on this device. And just unplugging & replugging my device / GPIO 0 caused flash encryption to become disabled. Crazy. This is why my device wont boot anymore & logs How am I sure flash encryption was enabled?
So, somehow, flash encryption is no longer active. How could re-plugging the device cause that to happen? This is the result of
My device has a command to print the fuses, here are all the fuses currently set: notably, SPI_BOOT_CRYPT_CNT is not set anymore and reports 0x0 as its value. Fuses:
|
on a different proper functioning device I can see
All these fuses are burned at the same time in my production script
so really, on the affected device, all evidence points to |
also note, my devices does not use |
Few questions:
|
I was waiting for your recommendation before I tried it. I have not yet reproduced it, but i have not tried the exact repro scenario yet. I only have a few devices and was not sure if I would permanently break them. I will try the exact repro scenario again now that I think i can reverse the damage. will try soon do you have any guesses what is happening? |
So the sequence goes something like this:
Is this correct understanding? CC @KonstantinKondrashov
No. The issue looks entirely different and not something that we heard/observed previously. (So far) Most of the failures in the security features workflow can be attributed to errors in the efuse configuration:
|
yes exactly. Power cycle + gpio 0 toggling. I will try to repro it again. I have done enough testing that I think I can rule out power cycles alone. I think GPIO 0 is related somehow. |
I have results.
After step 5, I checked
These are the logs for step 5
|
did more stuff.. step 6: I flashed encrypted firmware So in the end, I was able to recover the device into encrypted mode. However, this is of course very unacceptable. A consumer of my product will not be able to recover. |
This suggests that device has following 3 key purposes:
In this case the key used will still be 512 bit and derived from key purpose 2/3. Please refer to section 23.4.2 in the S3 TRM for more details. Q: you wrote this new key purpose block (4) intentionally? Which key you are using to encrypt the artifacts here?
Are these errors common for your hardware design? Or they are occurring only on this specific device?
Any specific purpose for this partition? This is used for the virtual efuse scenario but not something that is recommended outside of IDF test environment. |
No. I have no idea why it is being written.
I've never seen them before, AFAIK.
I used it during development, and never got rid of it. I think it is okay to keep? sdkconfig has virtual fuses disabled. |
I have a video showing how to reproduce strange inconsistent behavior with GPIO 0. I think the exact GPIO 0 sequence is not too exact. In the video I think I did it different ways. I do wonder if I'm hitting some type of power glitch. My board does have a good amount of capacitance. (schematic above) I'm not too concerned about the "RED LED" issue in the video below. The only issue that really matters was the corrupt firmware issue I originally reported. I think they might be related though. I'm not exactly sure if this "RED LED" situation is what caused the original flash corruption issue. It's just a lead worth investigating. I have not been able to reproduce the flash corruption issue again. Ultimately, it would just be nice to understand what is going on that causes GPIO1 / GPIO8, etc, to go into this weird state. Notes:
IMG_0337.mp4 |
Answers checklist.
IDF version.
v5.1.4
Espressif SoC revision.
ESP32-s3
Operating System used.
Windows
How did you build your project?
Command line with Make
If you are using Windows, please specify command line type.
None
Development Kit.
custom pcb
Power Supply used.
USB
What is the expected behavior?
Device can survive power loss without bricking itself
What is the actual behavior?
the bootloader becomes corrupted, (invalid header: 0xce5209fc)
edit: it appears this bug caused SPI_BOOT_CRYPT_CNT to somehow get reset back to 0, causing flash encryption to become disabled. see my comment here: #13783 (comment)
Steps to reproduce.
Note: I am using flash encryption, this seems to be required to repro
Note: I have a custom bootloader that waits 1 second before booting into app (probably unrelated)
Debug Logs.
More Information.
No response
The text was updated successfully, but these errors were encountered: