Decouple `transceivers` server from Front IO readiness #1588

Aaron-Hartwig · 2024-01-11T00:01:26Z

Currently the transceivers server will loop and await the Front IO board to be ready before entering is service loop. This means that in the case where the Front IO board is not present (or ready), transceivers server cannot handle any incoming requests, making the host software that is sending those requests unhappy. This change addresses that.

This is intended to address #1562

drv/transceivers-server/src/main.rs

Aaron-Hartwig · 2024-01-12T23:34:52Z

This came together a bit late in the day for me to test this with remote hands, but I think its coming together. I could test some of the easy failure modes where the FPGA rejects the attempt to contact a module which proves out the path for the FPGA-in-the-loop HwErrors. I still need to test out the Front IO presence bit.

On the host:

aaron@niles ~ $ ./xcvradm -i axf6 -t 3,4,5,6 status
Port Status
   3 PRESENT | ENABLED | LOW_POWER_MODE | INTERRUPT | POWER_GOOD
   4 PRESENT | RESET | INTERRUPT
   5 PRESENT | ENABLED | RESET | POWER_GOOD
   6 ENABLED | RESET | INTERRUPT
aaron@niles ~ $ ./xcvradm -i axf6 -t 3,4,5,6 vendor-info
Port Identifier               Vendor           Part             Rev  Serial           Mfg date
   3 QsfpPlusCmis (0x1e)      FINISAR CORP.    FTCC1112E2PCL    A    X65BPQR          210901
Some operations failed, errors below
Port Error
   4 Hardware error accessing module 4: The module is not powered
   5 Hardware error accessing module 5: The module is not initialized
   6 Hardware error accessing module 6: The module is not present

The successful read is less interesting in terms of its Hubris ringbuf, but when the failures are isolated the SP also gives a clear picture of what is going on:

humility: ring buffer drv_transceivers_server::udp::__RINGBUF in transceivers:
 NDX LINE      GEN    COUNT PAYLOAD
   0  367        1        1 Read(ModuleId(0x70), MemoryRead { page: Sff8636(Lower), offset: 0x0, len: 0x1 })
   1 1209        1        1 ReadI2CFailures(LogicalPort(0x4), NoPower)
   2 1209        1        1 ReadI2CFailures(LogicalPort(0x5), NotInitialized)
   3 1209        1        1 ReadI2CFailures(LogicalPort(0x6), NoModule)
   4  408        1        1 OperationResult(ModuleResultSlim { success: LogicalPortMask(0x0), failure: LogicalPortMask(0x70), error: LogicalPortMask(0x0) })
   5  274        1        1 ResponseSize(ResponseSize { header_length: 0xa, message_length: 0x17, data_length: 0x3 })

Aaron-Hartwig · 2024-01-16T19:32:15Z

This branch depends on oxidecomputer/transceiver-control#200

Aaron-Hartwig · 2024-01-23T19:48:33Z

This PR also addresses #1572

hawkw

Some minor code style suggestions --- I hope they're useful!

drv/transceivers-server/src/main.rs

drv/transceivers-server/src/udp.rs

Aaron-Hartwig · 2024-03-22T12:58:54Z

@hawkw they were, thank you!!

Aaron-Hartwig · 2024-03-22T17:10:17Z

We were able to arrange some hardware in the office to test the case of a missing Front IO board:

aaron@lurch ~ $ ./xcvradm -i axf1 -p fe80::c1d:2eff:fee3:e69a status
Port Status
Some operations failed, errors below
Port Error
   0 Hardware error accessing module 0: Front IO board not present
   1 Hardware error accessing module 1: Front IO board not present
   2 Hardware error accessing module 2: Front IO board not present
   3 Hardware error accessing module 3: Front IO board not present
   4 Hardware error accessing module 4: Front IO board not present
   5 Hardware error accessing module 5: Front IO board not present
   6 Hardware error accessing module 6: Front IO board not present
   7 Hardware error accessing module 7: Front IO board not present
   8 Hardware error accessing module 8: Front IO board not present
   9 Hardware error accessing module 9: Front IO board not present
  10 Hardware error accessing module 10: Front IO board not present
  11 Hardware error accessing module 11: Front IO board not present
  12 Hardware error accessing module 12: Front IO board not present
  13 Hardware error accessing module 13: Front IO board not present
  14 Hardware error accessing module 14: Front IO board not present
  15 Hardware error accessing module 15: Front IO board not present
  16 Hardware error accessing module 16: Front IO board not present
  17 Hardware error accessing module 17: Front IO board not present
  18 Hardware error accessing module 18: Front IO board not present
  19 Hardware error accessing module 19: Front IO board not present
  20 Hardware error accessing module 20: Front IO board not present
  21 Hardware error accessing module 21: Front IO board not present
  22 Hardware error accessing module 22: Front IO board not present
  23 Hardware error accessing module 23: Front IO board not present
  24 Hardware error accessing module 24: Front IO board not present
  25 Hardware error accessing module 25: Front IO board not present
  26 Hardware error accessing module 26: Front IO board not present
  27 Hardware error accessing module 27: Front IO board not present
  28 Hardware error accessing module 28: Front IO board not present
  29 Hardware error accessing module 29: Front IO board not present
  30 Hardware error accessing module 30: Front IO board not present
  31 Hardware error accessing module 31: Front IO board not present

So this is good to go once we land oxidecomputer/transceiver-control#200 and do the requisite Cargo.toml adjustment to get us off the transcevier-messages branch.

arjenroodselaar

Since you replaced all the unwrap() in the UDP module with unwrap_lite() you may want to do the same in main.rs. I think that'll shrink the size of the executable as less formatting code will be pulled in.

arjenroodselaar · 2024-03-22T17:22:37Z

drv/transceivers-server/src/udp.rs

+                let result =
+                    if self.front_io_board_present == FrontIOStatus::Ready {
+                        self.transceivers.clear_power_fault(mask)
+                    } else {
+                        ModuleResultNoFailure::new(LogicalPortMask(0), mask)
+                            .unwrap_lite()
+                    };


This is totally fine and functional. If you'd want to improve the readability of the

if self.front_io_board_present == FrontIOStatus::Ready { ... no front IO board error response } else { ... actual operation }

in every handler function a little bit you could consider making a static, pre canned MessageBody::SpResponse(..) for this case and change the handlers to

if self.front_io_board_present == FrontIOStatus::Ready { return sp_response_no_front_io_board; } ... actual operation

That would remove this common check from the parse tree for anyone reading this code.

I DM'd Arjen a bit about this and I'm not going to make any changes here. Given every request type has a corresponding response type (e.g. HostRequest::Status -> SpResponse::Status) there's really not a nice way to clean this up. I could make a separate code path that checks front_io_board_present prior to dropping into the message handler, but that just duplicats the same match structure elsewhere.

Aaron-Hartwig requested a review from arjenroodselaar January 11, 2024 00:01

arjenroodselaar reviewed Jan 11, 2024

View reviewed changes

drv/transceivers-server/src/main.rs Outdated Show resolved Hide resolved

drv/transceivers-server/src/main.rs Outdated Show resolved Hide resolved

Aaron-Hartwig force-pushed the xcvrs-no-front-io branch from c197051 to 562a65c Compare January 12, 2024 23:35

Aaron-Hartwig marked this pull request as ready for review January 23, 2024 19:46

Aaron-Hartwig force-pushed the xcvrs-no-front-io branch from 562a65c to 0a2ab73 Compare January 23, 2024 19:47

Aaron-Hartwig force-pushed the xcvrs-no-front-io branch 3 times, most recently from dd1962b to 42661e8 Compare March 21, 2024 14:49

hawkw reviewed Mar 21, 2024

View reviewed changes

Aaron-Hartwig force-pushed the xcvrs-no-front-io branch from 525a1a3 to 08a5a77 Compare March 22, 2024 14:31

arjenroodselaar approved these changes Mar 22, 2024

View reviewed changes

Aaron-Hartwig added 11 commits March 25, 2024 11:06

decouple fron io readiness from transceivers server responses

e6f7434

swap i2c loop with check

9352452

cargo fmt

a83e1a1

WIP

0dfd1c7

add enhanced failure and error reporting

d74f1f7

fix bug where we still try to SPI with no targets

ce5ce8a

update transceiver-messages crate

d85d3d8

address Eliza's feedback

f70ae90

more unwrap -> unwrap_lite fun

a82ce2b

fucking cargo fmt

0db5b15

update transceiver-messages crate

2285533

Aaron-Hartwig force-pushed the xcvrs-no-front-io branch from 22bae15 to 2285533 Compare March 25, 2024 16:06

Aaron-Hartwig enabled auto-merge (squash) March 25, 2024 16:11

Aaron-Hartwig merged commit 1145c39 into master Mar 25, 2024
103 checks passed

Aaron-Hartwig deleted the xcvrs-no-front-io branch March 25, 2024 16:16

Aaron-Hartwig mentioned this pull request Apr 18, 2024

bug: sidecar_front_io::transceivers::wait_and_check_i2c is not checking all ports #1754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple `transceivers` server from Front IO readiness #1588

Decouple `transceivers` server from Front IO readiness #1588

Aaron-Hartwig commented Jan 11, 2024

Aaron-Hartwig commented Jan 12, 2024

Aaron-Hartwig commented Jan 16, 2024

Aaron-Hartwig commented Jan 23, 2024

hawkw left a comment

Aaron-Hartwig commented Mar 22, 2024

Aaron-Hartwig commented Mar 22, 2024

arjenroodselaar left a comment

arjenroodselaar Mar 22, 2024

Aaron-Hartwig Mar 22, 2024

Decouple transceivers server from Front IO readiness #1588

Decouple transceivers server from Front IO readiness #1588

Conversation

Aaron-Hartwig commented Jan 11, 2024

Aaron-Hartwig commented Jan 12, 2024

Aaron-Hartwig commented Jan 16, 2024

Aaron-Hartwig commented Jan 23, 2024

hawkw left a comment

Choose a reason for hiding this comment

Aaron-Hartwig commented Mar 22, 2024

Aaron-Hartwig commented Mar 22, 2024

arjenroodselaar left a comment

Choose a reason for hiding this comment

arjenroodselaar Mar 22, 2024

Choose a reason for hiding this comment

Aaron-Hartwig Mar 22, 2024

Choose a reason for hiding this comment

Decouple `transceivers` server from Front IO readiness #1588

Decouple `transceivers` server from Front IO readiness #1588