Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple transceivers server from Front IO readiness #1588

Merged
merged 11 commits into from Mar 25, 2024

Conversation

Aaron-Hartwig
Copy link
Contributor

Currently the transceivers server will loop and await the Front IO board to be ready before entering is service loop. This means that in the case where the Front IO board is not present (or ready), transceivers server cannot handle any incoming requests, making the host software that is sending those requests unhappy. This change addresses that.

This is intended to address #1562

drv/transceivers-server/src/main.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/main.rs Outdated Show resolved Hide resolved
@Aaron-Hartwig
Copy link
Contributor Author

This came together a bit late in the day for me to test this with remote hands, but I think its coming together. I could test some of the easy failure modes where the FPGA rejects the attempt to contact a module which proves out the path for the FPGA-in-the-loop HwErrors. I still need to test out the Front IO presence bit.

On the host:

aaron@niles ~ $ ./xcvradm -i axf6 -t 3,4,5,6 status
Port Status
   3 PRESENT | ENABLED | LOW_POWER_MODE | INTERRUPT | POWER_GOOD
   4 PRESENT | RESET | INTERRUPT
   5 PRESENT | ENABLED | RESET | POWER_GOOD
   6 ENABLED | RESET | INTERRUPT
aaron@niles ~ $ ./xcvradm -i axf6 -t 3,4,5,6 vendor-info
Port Identifier               Vendor           Part             Rev  Serial           Mfg date
   3 QsfpPlusCmis (0x1e)      FINISAR CORP.    FTCC1112E2PCL    A    X65BPQR          210901
Some operations failed, errors below
Port Error
   4 Hardware error accessing module 4: The module is not powered
   5 Hardware error accessing module 5: The module is not initialized
   6 Hardware error accessing module 6: The module is not present

The successful read is less interesting in terms of its Hubris ringbuf, but when the failures are isolated the SP also gives a clear picture of what is going on:

humility: ring buffer drv_transceivers_server::udp::__RINGBUF in transceivers:
 NDX LINE      GEN    COUNT PAYLOAD
   0  367        1        1 Read(ModuleId(0x70), MemoryRead { page: Sff8636(Lower), offset: 0x0, len: 0x1 })
   1 1209        1        1 ReadI2CFailures(LogicalPort(0x4), NoPower)
   2 1209        1        1 ReadI2CFailures(LogicalPort(0x5), NotInitialized)
   3 1209        1        1 ReadI2CFailures(LogicalPort(0x6), NoModule)
   4  408        1        1 OperationResult(ModuleResultSlim { success: LogicalPortMask(0x0), failure: LogicalPortMask(0x70), error: LogicalPortMask(0x0) })
   5  274        1        1 ResponseSize(ResponseSize { header_length: 0xa, message_length: 0x17, data_length: 0x3 })

@Aaron-Hartwig
Copy link
Contributor Author

This branch depends on oxidecomputer/transceiver-control#200

@Aaron-Hartwig Aaron-Hartwig marked this pull request as ready for review January 23, 2024 19:46
@Aaron-Hartwig
Copy link
Contributor Author

This PR also addresses #1572

@Aaron-Hartwig Aaron-Hartwig force-pushed the xcvrs-no-front-io branch 3 times, most recently from dd1962b to 42661e8 Compare March 21, 2024 14:49
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor code style suggestions --- I hope they're useful!

drv/transceivers-server/src/main.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
drv/transceivers-server/src/udp.rs Outdated Show resolved Hide resolved
@Aaron-Hartwig
Copy link
Contributor Author

@hawkw they were, thank you!!

@Aaron-Hartwig
Copy link
Contributor Author

We were able to arrange some hardware in the office to test the case of a missing Front IO board:

aaron@lurch ~ $ ./xcvradm -i axf1 -p fe80::c1d:2eff:fee3:e69a status
Port Status
Some operations failed, errors below
Port Error
   0 Hardware error accessing module 0: Front IO board not present
   1 Hardware error accessing module 1: Front IO board not present
   2 Hardware error accessing module 2: Front IO board not present
   3 Hardware error accessing module 3: Front IO board not present
   4 Hardware error accessing module 4: Front IO board not present
   5 Hardware error accessing module 5: Front IO board not present
   6 Hardware error accessing module 6: Front IO board not present
   7 Hardware error accessing module 7: Front IO board not present
   8 Hardware error accessing module 8: Front IO board not present
   9 Hardware error accessing module 9: Front IO board not present
  10 Hardware error accessing module 10: Front IO board not present
  11 Hardware error accessing module 11: Front IO board not present
  12 Hardware error accessing module 12: Front IO board not present
  13 Hardware error accessing module 13: Front IO board not present
  14 Hardware error accessing module 14: Front IO board not present
  15 Hardware error accessing module 15: Front IO board not present
  16 Hardware error accessing module 16: Front IO board not present
  17 Hardware error accessing module 17: Front IO board not present
  18 Hardware error accessing module 18: Front IO board not present
  19 Hardware error accessing module 19: Front IO board not present
  20 Hardware error accessing module 20: Front IO board not present
  21 Hardware error accessing module 21: Front IO board not present
  22 Hardware error accessing module 22: Front IO board not present
  23 Hardware error accessing module 23: Front IO board not present
  24 Hardware error accessing module 24: Front IO board not present
  25 Hardware error accessing module 25: Front IO board not present
  26 Hardware error accessing module 26: Front IO board not present
  27 Hardware error accessing module 27: Front IO board not present
  28 Hardware error accessing module 28: Front IO board not present
  29 Hardware error accessing module 29: Front IO board not present
  30 Hardware error accessing module 30: Front IO board not present
  31 Hardware error accessing module 31: Front IO board not present

So this is good to go once we land oxidecomputer/transceiver-control#200 and do the requisite Cargo.toml adjustment to get us off the transcevier-messages branch.

Copy link
Contributor

@arjenroodselaar arjenroodselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you replaced all the unwrap() in the UDP module with unwrap_lite() you may want to do the same in main.rs. I think that'll shrink the size of the executable as less formatting code will be pulled in.

Comment on lines +647 to +653
let result =
if self.front_io_board_present == FrontIOStatus::Ready {
self.transceivers.clear_power_fault(mask)
} else {
ModuleResultNoFailure::new(LogicalPortMask(0), mask)
.unwrap_lite()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally fine and functional. If you'd want to improve the readability of the

if self.front_io_board_present == FrontIOStatus::Ready {
    ... no front IO board error response
} else {
    ... actual operation
}

in every handler function a little bit you could consider making a static, pre canned MessageBody::SpResponse(..) for this case and change the handlers to

if self.front_io_board_present == FrontIOStatus::Ready {
    return sp_response_no_front_io_board;
}

... actual operation

That would remove this common check from the parse tree for anyone reading this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I DM'd Arjen a bit about this and I'm not going to make any changes here. Given every request type has a corresponding response type (e.g. HostRequest::Status -> SpResponse::Status) there's really not a nice way to clean this up. I could make a separate code path that checks front_io_board_present prior to dropping into the message handler, but that just duplicats the same match structure elsewhere.

@Aaron-Hartwig Aaron-Hartwig enabled auto-merge (squash) March 25, 2024 16:11
@Aaron-Hartwig Aaron-Hartwig merged commit 1145c39 into master Mar 25, 2024
103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants