You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SRAM must be initialized/zeroized at time 0 by the SoC - integrator responsibility
HW fix for the bug is postponed as a 2.0 code improvement
However, (due to the issue described in chipsalliance/caliptra-rtl#340, which is rooted in the same HW logic as chipsalliance/caliptra-rtl#399) there remains a possibility that double-bit ECC errors in Mailbox RAM may trigger error interrupts in subsequent operations even when no error occurred. This is a rare edge case that requires the corrupted dword to lie exactly at the end of the dlen provisioned for future mailbox commands.
Solution/Workaround
When an uncorrectable ECC error is detected, the active command should be failed by Caliptra, and firmware should zeroize the Mailbox SRAM. This will not be necessary after 1.1 hardware.
Caveat
In order to zeroize SRAM, Caliptra firmware must acquire the mailbox lock. As the error may occur at any time, it is possible to encounter contention with the SoC when requesting the lock. Firmware should not enter a simple forever loop trying to acquire the lock, as this may trigger deadlock.
The text was updated successfully, but these errors were encountered:
@korran
A couple of methodology comments/case studies:
General methods
uC can clear mailbox via direct-mode accesses if mailbox state is execute_uc.
uC may need to use mbox_unlock in order to acquire lock in other cases, to guarantee mailbox is cleared.
CASE: ECC error while uC reads dataout
SoC-initiated mailbox command
ECC double-bit error while uC is reading dataout Action: uC should check error status before writing mbox_status. If ECC error, uC first sanitizes mailbox, then sets CMD_FAILURE. In state mbox_execute_uc, uC is able to sanitize the mailbox using direct mode accesses.
CASE: ECC error while SoC reads dataout (uC-initiated cmd)
uC initiates mailbox command to SoC for, e.g. CSR.
SoC encounters ECC uncorrectable error while reading out data.
SoC should respond with CMD_FAILURE. Action: uC can sanitize mailbox before clearing lock
CASE: ECC error while SoC reads dataout (uC response to SoC-initiated cmd)
SoC initiates command
uC reads dataout -> no ECC error
uC writes datain (response to cmd)
uC set mbox_status = DATA_READY
SoC reads dataout -> ECC error occurs
state is mbox_execute_soc, so uC is unable to sanitize mailbox
SoC clears lock (by writing mbox_execute = 0) Action: uC should acquire lock, then sanitize mailbox. uC may fail to acquire lock if SoC wins lock for a subsequent command. uC should use mbox_unlock to forcibly unlock the mailbox and then gain the lock.
Background
Related to chipsalliance/caliptra-rtl#399
The fix for that logic issue was:
However, (due to the issue described in chipsalliance/caliptra-rtl#340, which is rooted in the same HW logic as chipsalliance/caliptra-rtl#399) there remains a possibility that double-bit ECC errors in Mailbox RAM may trigger error interrupts in subsequent operations even when no error occurred. This is a rare edge case that requires the corrupted dword to lie exactly at the end of the dlen provisioned for future mailbox commands.
Solution/Workaround
When an uncorrectable ECC error is detected, the active command should be failed by Caliptra, and firmware should zeroize the Mailbox SRAM. This will not be necessary after 1.1 hardware.
Caveat
In order to zeroize SRAM, Caliptra firmware must acquire the mailbox lock. As the error may occur at any time, it is possible to encounter contention with the SoC when requesting the lock. Firmware should not enter a simple forever loop trying to acquire the lock, as this may trigger deadlock.
The text was updated successfully, but these errors were encountered: