Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: update existing OSDs with deviceClass #9259

Merged
merged 1 commit into from Nov 29, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion pkg/operator/ceph/cluster/osd/update.go
Expand Up @@ -126,7 +126,9 @@ func (c *updateConfig) updateExistingOSDs(errs *provisionErrors) {
}

// backward compatibility for old deployments
if osdInfo.DeviceClass == "" {
// Checking DeviceClass with None too, because ceph-volume lvm list return crush device class as None
// Tracker https://tracker.ceph.com/issues/53425
if osdInfo.DeviceClass == "" || osdInfo.DeviceClass == "None" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the huddle I think using ,omitempty" might better solve this instead of checking for None. Also I cannot get a repro on why we would get None. https://go.dev/play/p/xKyIFkmrIKl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the ,omitempty" at these two places,

DeviceClass string `json:"device-class"`

DeviceClass string `json:"device_class"`

But still, see that osdInfo.DeviceClass is set as None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum I'd be curious to trace back and really understand why the command will return None. That shouldn't be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leseb, I traced it out None value is returned from the env. variable

if envVar.Name == osdDeviceClassEnvVarName {
osd.DeviceClass = envVar.Value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, weird it is not supposed to since it defaults to "".

Copy link
Member

@leseb leseb Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is the culprit https://github.com/rook/rook/blob/master/pkg/daemon/ceph/osd/volume.go#L85,

see:

  [block]       /dev/ceph-a8ae9f01-a440-4a0e-8e4d-592d3bce3a9d/osd-block-058f4926-6ee7-4165-9a55-db5a1fd22d2f

      block device              /dev/ceph-a8ae9f01-a440-4a0e-8e4d-592d3bce3a9d/osd-block-058f4926-6ee7-4165-9a55-db5a1fd22d2f
      block uuid                cr4xnI-h1sx-4DT3-ZC2N-0kI9-RtZF-Nkr3q1
      cephx lockbox secret      
      cluster fsid              f20931eb-9336-4234-a5c7-b0b44ab8c07a
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  058f4926-6ee7-4165-9a55-db5a1fd22d2f
      osd id                    2
      osdspec affinity          
      type                      block
      vdo                       0
      devices                   /dev/vdd

With crush device class None.

In the end, it's valid to check for None since we don't control this behavior!

Copy link
Member Author

@parth-gr parth-gr Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked it while testing it doesn't make any change.

I see the ROOK_OSD_DEVICE_CLASS environment variable is returned as {ROOK_OSD_DEVICE_CLASS None nil} this might be the actual problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.
This is because of how ceph returns the output ceph-volume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a comment in the code to explain why we need to check against Noneand add a link to this tracker: https://tracker.ceph.com/issues/53425

deviceClassInfo, err := cephclient.OSDDeviceClasses(c.cluster.context, c.cluster.clusterInfo, []string{strconv.Itoa(osdID)})
if err != nil {
logger.Errorf("failed to get device class for existing deployment %q. %v", depName, err)
Expand Down