Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP expose disk firmware in inventory #5706

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

papertigers
Copy link
Contributor

No description provided.

Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure makes sense to me, just a couple nitpicks!

pub struct DiskFirmware {
active_slot: u8,
next_active_slot: Option<u8>,
slot1_read_only: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd that all the other fields imply an arbitrary number of slots, but this field specifically embeds the notion of "slot1"? Also, is this one or zero-indexed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to the NVMe spec! The spec states that devices have slot 1-7 (and potentially more in the future as there's unused padding at the end of the C struct). The spec also says that only slot 1 can be read only. Not every device will have all 7 slots available, for example the C bhyve nvme devices shows only one slot that is read only, while some of our WDC devices show 4 slots. And not every available slot will necessarily contain firmware. The firmware version itself is defined to be an ascii string hence the Vec<Option<String>>.

After having chatted with @jgallagher about this we landed on NvmeSlot being a wrapping type to address the array index being 0 but corresponding with slot1 - this can be seen here and has yet to be officially reviewed. The thought was that one could gain access to all the slots via an iterator and upstream of the library you could deal with the 0 indexed array.

I originally had this represented as an enum but was unsure what the underlying CRDB type should look like. Very open to suggestions here as I just went with something for this PR as a placeholder.

active_slot: u8,
next_active_slot: Option<u8>,
slot1_read_only: bool,
slots: Vec<Option<String>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of this String? This feels a little bit like a more specific type that has been coerced into a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment.

let mut removed = Vec::new();
let mut updated = Vec::new();

// Find new or udpated disks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find new or udpated disks.
// Find new or updated disks.

Comment on lines +539 to +554
let controller_lock = match controller.try_read_lock() {
libnvme::controller::TryLockResult::Ok(locked) => locked,
// We should only hit this if something in the system has locked the
// controller in question for writing.
libnvme::controller::TryLockResult::Locked(_) => {
warn!(
log,
"NVMe Controller is already locked so we will try again
in the next hardware snapshot"
);
return Err(Error::NvmeControllerLocked);
}
libnvme::controller::TryLockResult::Err(err) => {
return Err(Error::from(err))
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this makes sense, given our prior conversation

Comment on lines +430 to +432
// Today the only update we expect to be able to apply to
// a `Disk` is firmware.
pub(crate) fn update_disk(&mut self, raw_disk: &RawDisk) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could program this a little bit more defensively, in case there are other parts of these structures that can be updated. The comment indicates the true behavior ("we ignore everything but the firmware") but the calllsites don't really make that clear.

Can we either call this update_firmware, or have some other way to make sure "we don't drop updates on the floor" if something other than the firmware field changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this - I had originally left this as update_disk as there might be more fields in the future we wish to support. But I am happy to make this an explicit update_firmware for the time being. (side note... maybe update_firmware_metadata is a better name).

@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch from 6bc3a9c to 1ee77ff Compare May 23, 2024 20:43
@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch from 46347bf to fa35f33 Compare May 23, 2024 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants