Skip to content

Commit

Permalink
Reset debug port on startup (#1763)
Browse files Browse the repository at this point in the history
We noticed on the dogfood rack that a Sidecar's 10G link was
persistently down.

Normally, this is fixed by a watchdog. However, the watchdog only fires
if the PCIe link is active
(#1510).

The PCIe link was falsely being reported as down because the debug port
state (`TOFINO_DEBUG_PORT_STATE`) had `receive_buffer_empty = false`, so
we bailed out of the check at [this
condition](https://github.com/oxidecomputer/hubris/blob/020d014880382d872d048fbfe1e8152a39e7c47a/drv/sidecar-mainboard-controller/src/tofino2.rs#L662).
This failure was persistent through SP reboots (which notably do not
reflash the FPGA), so it's likely out-of-sync state between the FPGA and
SP.

This PR adds a startup step to reset the debug port tx/rx buffers by
writing to the `TOFINO_DEBUG_PORT_STATE` register.

Flashing this firmware onto the misbehaving system brought it back into
working state (i.e. reporting `pcie_link = true`).
  • Loading branch information
mkeeter committed Apr 26, 2024
1 parent 020d014 commit ab8d428
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
8 changes: 8 additions & 0 deletions drv/sidecar-mainboard-controller/src/tofino2.rs
Expand Up @@ -646,6 +646,14 @@ impl DebugPort {
self.fpga.read(Addr::TOFINO_DEBUG_PORT_STATE)
}

/// Resets debug port state by clearing the send and receive buffers
pub fn reset(&self) -> Result<(), FpgaError> {
let mut state = DebugPortState(0);
state.set_send_buffer_empty(true);
state.set_receive_buffer_empty(true);
self.set_state(state)
}

pub fn set_state(&self, state: DebugPortState) -> Result<(), FpgaError> {
self.fpga
.write(WriteOp::Write, Addr::TOFINO_DEBUG_PORT_STATE, state)
Expand Down
3 changes: 3 additions & 0 deletions drv/sidecar-seq-server/src/main.rs
Expand Up @@ -1001,6 +1001,9 @@ fn main() -> ! {
None => {}
}

// Clear debug port state in the FPGA
server.tofino.debug_port.reset().unwrap_lite();

// Power on, unless suppressed by the `stay-in-a2` feature
if !cfg!(feature = "stay-in-a2") {
server.tofino.policy = TofinoSequencerPolicy::LatchOffOnFault;
Expand Down

0 comments on commit ab8d428

Please sign in to comment.