Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware should sanitize the mailbox memory in response to double-bit (uncorrectable) ECC error #1509

Open
calebofearth opened this issue May 13, 2024 · 1 comment

Comments

@calebofearth
Copy link

Background

Related to chipsalliance/caliptra-rtl#399

The fix for that logic issue was:

  • SRAM must be initialized/zeroized at time 0 by the SoC - integrator responsibility
  • HW fix for the bug is postponed as a 2.0 code improvement

However, (due to the issue described in chipsalliance/caliptra-rtl#340, which is rooted in the same HW logic as chipsalliance/caliptra-rtl#399) there remains a possibility that double-bit ECC errors in Mailbox RAM may trigger error interrupts in subsequent operations even when no error occurred. This is a rare edge case that requires the corrupted dword to lie exactly at the end of the dlen provisioned for future mailbox commands.

Solution/Workaround

When an uncorrectable ECC error is detected, the active command should be failed by Caliptra, and firmware should zeroize the Mailbox SRAM. This will not be necessary after 1.1 hardware.

Caveat

In order to zeroize SRAM, Caliptra firmware must acquire the mailbox lock. As the error may occur at any time, it is possible to encounter contention with the SoC when requesting the lock. Firmware should not enter a simple forever loop trying to acquire the lock, as this may trigger deadlock.

@calebofearth
Copy link
Author

@korran
A couple of methodology comments/case studies:

General methods

uC can clear mailbox via direct-mode accesses if mailbox state is execute_uc.
uC may need to use mbox_unlock in order to acquire lock in other cases, to guarantee mailbox is cleared.

CASE: ECC error while uC reads dataout

SoC-initiated mailbox command
ECC double-bit error while uC is reading dataout
Action: uC should check error status before writing mbox_status. If ECC error, uC first sanitizes mailbox, then sets CMD_FAILURE. In state mbox_execute_uc, uC is able to sanitize the mailbox using direct mode accesses.

CASE: ECC error while SoC reads dataout (uC-initiated cmd)

uC initiates mailbox command to SoC for, e.g. CSR.
SoC encounters ECC uncorrectable error while reading out data.
SoC should respond with CMD_FAILURE.
Action: uC can sanitize mailbox before clearing lock

CASE: ECC error while SoC reads dataout (uC response to SoC-initiated cmd)

SoC initiates command
uC reads dataout -> no ECC error
uC writes datain (response to cmd)
uC set mbox_status = DATA_READY
SoC reads dataout -> ECC error occurs
state is mbox_execute_soc, so uC is unable to sanitize mailbox
SoC clears lock (by writing mbox_execute = 0)
Action: uC should acquire lock, then sanitize mailbox. uC may fail to acquire lock if SoC wins lock for a subsequent command. uC should use mbox_unlock to forcibly unlock the mailbox and then gain the lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant