osd: re-open encrypted disk during osd-prepare-job if closed #11338

Rakshith-R · 2022-11-21T13:58:53Z

Description of your changes:

This commit implements this corner case during osd-prepare job.

The encrypted block is not opened, this is an extreme corner case
The OSD deployment has been removed manually AND the node rebooted
So we need to re-open the block to re-hydrate the OSDInfo.

Handling this case would mean, writing the encryption key on a
temporary file, then call luksOpen to open the encrypted block and
then call ceph-volume to list against the opened encrypted block.
We don't implement this, yet and return an error.

When underlying PVC for osd are CSI provisioned, the encrypted device is closed when PVC is unmounted due to osd pod being deleted. Therefore, this is no longer a corner case and needs to be handled. This commit implements the fix for the same.

Signed-off-by: Rakshith R rar@redhat.com

refer: #9434

Steps:

Deploy a encrypted cephcluster backed up dynamically CSI provisioned PVC for osd
Delete OSD deployment
watch osd-prepare-job logs

Before fix:

[rakshith@fedora rook]$ k logs rook-ceph-osd-prepare-set1-data-1s9k9w-5gqcp 
Defaulted container "provision" out of: provision, copy-bins (init), blkdevmapper (init)
2022-11-21 13:53:11.342069 I | cephcmd: desired devices to configure osds: [{Name:/mnt/set1-data-1s9k9w OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-21 13:53:11.342608 I | rookcmd: starting Rook v1.10.0-alpha.0.398.gfdc2ab55d with arguments '/rook/rook ceph osd provision'
2022-11-21 13:53:11.342620 I | rookcmd: flag values: --cluster-id=6ec0a6ea-99a5-4281-a70f-322e2ecdd666, --cluster-name=my-cluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/set1-data-1s9k9w","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=true, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=set1-data-1s9k9w, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-11-21 13:53:11.342624 I | op-mon: parsing mon endpoints: a=10.107.169.17:6789
2022-11-21 13:53:11.349240 I | op-osd: CRUSH location=root=default host=set1-data-1s9k9w
2022-11-21 13:53:11.349255 I | cephcmd: crush location of osd: root=default host=set1-data-1s9k9w
2022-11-21 13:53:11.349261 D | cephosd: encryption configuration detecting, populating kek to an env variable
2022-11-21 13:53:11.349274 D | cephosd: cluster-wide encryption is enabled with kubernetes secrets and the kek is attached to the provision env spec
2022-11-21 13:53:11.349341 D | exec: Running command: dmsetup version
2022-11-21 13:53:11.352040 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.43.0
2022-11-21 13:53:11.360727 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2022-11-21 13:53:11.360746 I | cephclient: writing config file /var/lib/rook/enc/enc.config
2022-11-21 13:53:11.360862 I | cephclient: generated admin config in /var/lib/rook/enc
2022-11-21 13:53:11.360988 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                = 2b0dc3de-c683-4ccd-b2c5-4a26c9fb964d
mon initial members = a
mon host            = [v2:10.107.169.17:3300,v1:10.107.169.17:6789]

[client.admin]
keyring = /var/lib/rook/enc/client.admin.keyring
2022-11-21 13:53:11.360993 I | cephosd: discovering hardware
2022-11-21 13:53:11.361003 D | exec: Running command: lsblk /mnt/set1-data-1s9k9w --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-21 13:53:11.363890 D | sys: lsblk output: "SIZE=\"10737418240\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/rbd0\" KNAME=\"/dev/rbd0\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-21 13:53:11.363933 D | exec: Running command: sgdisk --print /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.366033 D | exec: Running command: udevadm info --query=property /dev/rbd0
2022-11-21 13:53:11.370839 D | sys: udevadm info output: "DEVNAME=/dev/rbd0\nDEVPATH=/devices/rbd/0/block/rbd0\nDEVTYPE=disk\nMAJOR=251\nMINOR=0\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=2068042192"
2022-11-21 13:53:11.370862 I | cephosd: creating and starting the osds
2022-11-21 13:53:11.370968 D | cephosd: desiredDevices are [{Name:/mnt/set1-data-1s9k9w OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-21 13:53:11.370976 D | cephosd: context.Devices are:
2022-11-21 13:53:11.371001 D | cephosd: &{Name:/mnt/set1-data-1s9k9w Parent: HasChildren:false DevLinks: Size:10737418240 UUID:401766c2-db89-40be-a77e-82546497753b Serial: Type:data Rotational:false Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/rbd0 KernelName:rbd0 Encrypted:false}
2022-11-21 13:53:11.371021 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-11-21 13:53:11.371079 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.392270 D | exec: Running command: lsblk --noheadings --path --list --output NAME /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.394738 I | cephosd: skipping device "/mnt/set1-data-1s9k9w": failed to detect if there is already an osd. failed to find the encrypted block device for "/mnt/set1-data-1s9k9w", not opened?.
2022-11-21 13:53:11.403057 I | cephosd: configuring osd devices: {"Entries":{}}
2022-11-21 13:53:11.403331 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-11-21 13:53:11.403443 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.482979 W | cephosd: failed to retrieve logical volume path for "/mnt/set1-data-1s9k9w". exit status 5
2022-11-21 13:53:11.483049 D | exec: Running command: lsblk /mnt/set1-data-1s9k9w --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-21 13:53:11.489011 D | sys: lsblk output: "SIZE=\"10737418240\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/rbd0\" KNAME=\"/dev/rbd0\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-21 13:53:11.489308 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-11-21 13:53:11.901643 D | cephosd: {}
2022-11-21 13:53:11.901695 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-11-21 13:53:11.901705 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.921288 D | exec: Running command: lsblk --noheadings --path --list --output NAME /mnt/set1-data-1s9k9w
2022-11-21 13:53:11.935549 C | rookcmd: failed to configure devices: failed to get device already provisioned by ceph-volume raw: failed to find the encrypted block device for "/mnt/set1-data-1s9k9w", not opened?

After fix:

2022-11-21 13:41:14.216619 I | cephcmd: desired devices to configure osds: [{Name:/mnt/set1-data-1s9k9w OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-21 13:41:14.219440 I | rookcmd: starting Rook v1.8.0-alpha.0.1391.g48a3879ca with arguments '/rook/rook ceph osd provision'
2022-11-21 13:41:14.219458 I | rookcmd: flag values: --cluster-id=6ec0a6ea-99a5-4281-a70f-322e2ecdd666, --cluster-name=my-cluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/set1-data-1s9k9w","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=true, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=set1-data-1s9k9w, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-11-21 13:41:14.219464 I | op-mon: parsing mon endpoints: a=10.107.169.17:6789
2022-11-21 13:41:14.226495 I | op-osd: CRUSH location=root=default host=set1-data-1s9k9w
2022-11-21 13:41:14.226519 I | cephcmd: crush location of osd: root=default host=set1-data-1s9k9w
2022-11-21 13:41:14.226526 D | cephosd: encryption configuration detecting, populating kek to an env variable
2022-11-21 13:41:14.226542 D | cephosd: cluster-wide encryption is enabled with kubernetes secrets and the kek is attached to the provision env spec
2022-11-21 13:41:14.226546 D | exec: Running command: dmsetup version
2022-11-21 13:41:14.229333 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.43.0
2022-11-21 13:41:14.237855 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2022-11-21 13:41:14.237901 I | cephclient: writing config file /var/lib/rook/enc/enc.config
2022-11-21 13:41:14.238068 I | cephclient: generated admin config in /var/lib/rook/enc
2022-11-21 13:41:14.238220 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                = 2b0dc3de-c683-4ccd-b2c5-4a26c9fb964d
mon initial members = a
mon host            = [v2:10.107.169.17:3300,v1:10.107.169.17:6789]

[client.admin]
keyring = /var/lib/rook/enc/client.admin.keyring
2022-11-21 13:41:14.238256 I | cephosd: discovering hardware
2022-11-21 13:41:14.238264 D | exec: Running command: lsblk /mnt/set1-data-1s9k9w --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-21 13:41:14.245250 D | sys: lsblk output: "SIZE=\"10737418240\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/rbd0\" KNAME=\"/dev/rbd0\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-21 13:41:14.245382 D | exec: Running command: sgdisk --print /mnt/set1-data-1s9k9w
2022-11-21 13:41:14.248506 D | exec: Running command: udevadm info --query=property /dev/rbd0
2022-11-21 13:41:14.253954 D | sys: udevadm info output: "DEVNAME=/dev/rbd0\nDEVPATH=/devices/rbd/0/block/rbd0\nDEVTYPE=disk\nMAJOR=251\nMINOR=0\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=2068042192"
2022-11-21 13:41:14.253977 I | cephosd: creating and starting the osds
2022-11-21 13:41:14.253991 D | cephosd: desiredDevices are [{Name:/mnt/set1-data-1s9k9w OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-21 13:41:14.253994 D | cephosd: context.Devices are:
2022-11-21 13:41:14.254029 D | cephosd: &{Name:/mnt/set1-data-1s9k9w Parent: HasChildren:false DevLinks: Size:10737418240 UUID:51bebcba-2dad-45d6-a6ec-7a88f35a822c Serial: Type:data Rotational:false Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/rbd0 KernelName:rbd0 Encrypted:false}
2022-11-21 13:41:14.254043 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-11-21 13:41:14.254070 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:41:14.265422 D | exec: Running command: lsblk --noheadings --path --list --output NAME /mnt/set1-data-1s9k9w
2022-11-21 13:41:14.267579 D | exec: Running command: lsblk --noheadings --output TYPE /dev/mapper/set1-data-1s9k9w-block-dmcrypt
2022-11-21 13:41:14.270470 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /dev/mapper/set1-data-1s9k9w-block-dmcrypt --format json
2022-11-21 13:41:14.612073 D | cephosd: {
    "959b244d-043e-4603-9e89-20067b85888d": {
        "ceph_fsid": "2b0dc3de-c683-4ccd-b2c5-4a26c9fb964d",
        "device": "/dev/mapper/set1-data-1s9k9w-block-dmcrypt",
        "osd_id": 0,
        "osd_uuid": "959b244d-043e-4603-9e89-20067b85888d",
        "type": "bluestore"
    }
}
2022-11-21 13:41:14.612701 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:41:14.632525 D | exec: Running command: cryptsetup --verbose luksClose /dev/mapper/set1-data-1s9k9w-block-dmcrypt
2022-11-21 13:41:14.652763 I | cephosd: dm version:
Command successful.
2022-11-21 13:41:14.652784 I | cephosd: 1 ceph-volume raw osd devices configured on this node
2022-11-21 13:41:14.652790 I | cephosd: skipping device "/mnt/set1-data-1s9k9w": already in use by a raw OSD, no need to reconfigure.
2022-11-21 13:41:14.659073 I | cephosd: configuring osd devices: {"Entries":{}}
2022-11-21 13:41:14.659089 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-11-21 13:41:14.659095 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/set1-data-1s9k9w
2022-11-21 13:41:14.721583 W | cephosd: failed to retrieve logical volume path for "/mnt/set1-data-1s9k9w". exit status 5
2022-11-21 13:41:14.721605 D | exec: Running command: lsblk /mnt/set1-data-1s9k9w --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-21 13:41:14.724336 D | sys: lsblk output: "SIZE=\"10737418240\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/rbd0\" KNAME=\"/dev/rbd0\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-21 13:41:14.724364 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-11-21 13:41:15.104965 D | cephosd: {}
2022-11-21 13:41:15.105013 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-11-21 13:41:15.105022 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:41:15.111933 D | exec: Running command: lsblk --noheadings --path --list --output NAME /mnt/set1-data-1s9k9w
2022-11-21 13:41:15.113515 D | cephosd: encrypted block device "/mnt/set1-data-1s9k9w" is not open, opening it now
2022-11-21 13:41:15.113535 D | exec: Running command: cryptsetup luksOpen --verbose --allow-discards /mnt/set1-data-1s9k9w set1-data-1s9k9w-block-dmcrypt
2022-11-21 13:41:16.628777 D | exec: Key slot 0 unlocked.
2022-11-21 13:41:16.628795 D | exec: Command successful.
2022-11-21 13:41:16.630546 D | exec: Running command: lsblk --noheadings --output TYPE /dev/mapper/set1-data-1s9k9w-block-dmcrypt
2022-11-21 13:41:16.632304 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /dev/mapper/set1-data-1s9k9w-block-dmcrypt --format json
2022-11-21 13:41:16.904036 D | cephosd: {
    "959b244d-043e-4603-9e89-20067b85888d": {
        "ceph_fsid": "2b0dc3de-c683-4ccd-b2c5-4a26c9fb964d",
        "device": "/dev/mapper/set1-data-1s9k9w-block-dmcrypt",
        "osd_id": 0,
        "osd_uuid": "959b244d-043e-4603-9e89-20067b85888d",
        "type": "bluestore"
    }
}
2022-11-21 13:41:16.904087 D | exec: Running command: lsblk /dev/mapper/set1-data-1s9k9w-block-dmcrypt --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-21 13:41:16.906371 D | sys: lsblk output: "SIZE=\"10720641024\" ROTA=\"0\" RO=\"0\" TYPE=\"crypt\" PKNAME=\"\" NAME=\"/dev/mapper/set1-data-1s9k9w-block-dmcrypt\" KNAME=\"/dev/dm-2\" MOUNTPOINT=\"\" FSTYPE=\"ceph_bluestore\""
2022-11-21 13:41:16.906393 I | cephosd: setting device class "ssd" for device "/dev/mapper/set1-data-1s9k9w-block-dmcrypt"
2022-11-21 13:41:16.906401 D | exec: Running command: cryptsetup luksDump /mnt/set1-data-1s9k9w
2022-11-21 13:41:16.913197 D | exec: Running command: cryptsetup --verbose luksClose /dev/mapper/set1-data-1s9k9w-block-dmcrypt
2022-11-21 13:41:16.925925 I | cephosd: dm version:
Command successful.
2022-11-21 13:41:16.925950 I | cephosd: 1 ceph-volume raw osd devices configured on this node
2022-11-21 13:41:16.925981 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:959b244d-043e-4603-9e89-20067b85888d DevicePartUUID: DeviceClass:ssd BlockPath:/dev/mapper/set1-data-1s9k9w-block-dmcrypt MetadataPath: WalPath: SkipLVRelease:true Location:root=default host=set1-data-1s9k9w LVBackedPV:false CVMode:raw Store:bluestore TopologyAffinity: Encrypted:true}]

BlaineEXE

Let's include unit tests to make sure the stdin is working properly. Can we get a string that simulates a LUKS device?

BlaineEXE · 2022-11-28T16:10:10Z

pkg/daemon/ceph/osd/volume.go

+				// Let's re-open the block to re-hydrate the OSDInfo.
+				logger.Debugf("encrypted block device %q is not open, opening it now", block)
+				passphrase := os.Getenv(oposd.CephVolumeEncryptedKeyEnvVarName)
+				target := fmt.Sprintf("%s-%s", os.Getenv(oposd.PVCNameEnvVarName), oposd.DmcryptBlockType)


I have concerns about this. I'm not sure how strictly guaranteed the behavior we are relying on is. I wonder if there are kubernetes versions or variants where the pvc name is not part of the dm name.

I think we should be able to use the encryptedBlock variable here, no? That should contain -dmcrypt according to the lines above.

I have concerns about this. I'm not sure how strictly guaranteed the behavior we are relying on is. I wonder if there are kubernetes versions or variants where the pvc name is not part of the dm name.

This target name can be anything since we'll be opening luks with this target explicitly and call c-v on this.

I think we should be able to use the encryptedBlock variable here, no? That should contain -dmcrypt according to the lines above.

c-v prefers to have /dev/mapper/<name> format, that's why i kept both the variables separate.

I see. target is what luksOpen uses for its name parameter and is created by opening the disk. Thanks :)

BlaineEXE · 2022-11-28T16:18:48Z

pkg/daemon/ceph/osd/encryption.go

+func openLUKS(context *clusterd.Context, disk, target, passphrase string) error {
+	args := []string{"luksOpen", "--verbose", "--allow-discards", disk, target}
+	err := context.Executor.ExecuteCommandWithStdin(cryptsetupBinary, &passphrase, args...)
+	if err != nil {
+		logger.Errorf("failed to open LUKS device %q", disk)
+		return err
+	}
+
+	return nil
+}


I notice this doesn't have a timeout. For the prepare job, I know we have been more lax on resource requests/timeouts in the past. IMO, it would still be good to have a timeout just to make sure we can continue with provisioning if a disk is slow to respond or some other reason. For LUKS, I think 2 1/2 minutes (150 seconds) should be good given that a device should normally open within 30 seconds.

@travisn wdyt?

Why go 5 times the 30s? Either way the timeout seems plenty long so I don't have a strong opinion other than it seems like 30s would be sufficient.

Why go 5 times the 30s? Either way the timeout seems plenty long so I don't have a strong opinion other than it seems like 30s would be sufficient.

Set it to 30, I think that's sufficient.

I suggest a higher value so that we don't get flakiness in our tests because of the value. A lot of times, when there are resource-constrainted environments, what is "supposed" to happen timing-wise takes much longer. I use 5x as a rule of thumb because when I've used 3x, there is often still flakiness. Maybe 120 seconds is good, but I would really suggest using more than 30 and even more than 90.

I did some digging. Luks uses a static 2 seconds to open a disk with a key, which is strange that it's static, but it apparently is so. There can be 8 key slots, which means it could take 16 total seconds for each unlock attempt with each key. Plus any overhead for CPU time. I believe the normal case is that the unlock should take less than 3 seconds, but let's not risk create a flaky system for our CI or for corner-case users by using too aggressive of a timeout. I still think having at least a 90 second timeout is reasonable.

The most common case will be that it finishes successfully in ~2 seconds. In the event of some corruption, it will take 1.5 minutes. If I/O to the disk is stalling, it will take 1.5 minutes for each disk to fail. For our CI, I/O stalling seems like it's not uncommon, and there are many users who try to run on R-Pis and other small systems.

On a different angle, if a server with 60 disks attached to a SAS controller has a controller failure such that I/Os stall indefinitely without a noticeable failure, the prepare job won't fail until after 1.5 hours. That is quite a while, but IMO if a user has that many disks on a node, they likely expect installation and upgrades will take a fair amount of time, and 1.5 hours isn't a lot compared to a rolling upgrade that could take a week for large clusters.

changed it to 90,

leseb · 2022-11-29T10:50:14Z

pkg/daemon/ceph/osd/encryption.go

+	args := []string{"luksOpen", "--verbose", "--allow-discards", disk, target}
+	err := context.Executor.ExecuteCommandWithStdin(luksOpenCmdTimeOut, cryptsetupBinary, &passphrase, args...)
+	if err != nil {
+		logger.Errorf("failed to open LUKS device %q", disk)


Why both log and return err? The error is going to be surfaced eventually so just returning a wrapped error seems sufficient.

pkg/daemon/ceph/osd/volume.go

leseb · 2022-11-29T10:51:46Z

pkg/daemon/ceph/osd/volume.go

+				// Let's re-open the block to re-hydrate the OSDInfo.
+				logger.Debugf("encrypted block device %q is not open, opening it now", block)
+				passphrase := os.Getenv(oposd.CephVolumeEncryptedKeyEnvVarName)
+				target := fmt.Sprintf("%s-%s", os.Getenv(oposd.PVCNameEnvVarName), oposd.DmcryptBlockType)


Don't we have a helper for this somewhere in the OSD code?

found, exported and using it now.

Rakshith-R · 2022-11-29T11:27:33Z

Let's include unit tests to make sure the stdin is working properly.

Added some tests to simulate stdin and timeout.

Can we get a string that simulates a LUKS device?

I didn't get what you mean.

pkg/daemon/ceph/osd/encryption.go

leseb

mostly nits

leseb · 2022-11-29T12:51:08Z

pkg/daemon/ceph/osd/encryption.go

-	dmsetupBinary    = "dmsetup"
+	cryptsetupBinary   = "cryptsetup"
+	dmsetupBinary      = "dmsetup"
+	luksOpenCmdTimeOut = 30 * time.Second


This looks quite long for a timeout, I would think that 10sec is more than enough already.

leseb · 2022-11-29T12:51:47Z

pkg/daemon/ceph/osd/encryption.go

@@ -160,6 +162,16 @@ func dumpLUKS(context *clusterd.Context, disk string) (string, error) {
 	return cryptsetupOut, nil
 }

+func openLUKS(context *clusterd.Context, disk, target, passphrase string) error {


Suggested change

func openLUKS(context *clusterd.Context, disk, target, passphrase string) error {

func openEncryptedDevice(context *clusterd.Context, disk, target, passphrase string) error {

leseb · 2022-11-29T12:52:23Z

pkg/daemon/ceph/osd/encryption.go

+	args := []string{"luksOpen", "--verbose", "--allow-discards", disk, target}
+	err := context.Executor.ExecuteCommandWithStdin(luksOpenCmdTimeOut, cryptsetupBinary, &passphrase, args...)
+	if err != nil {
+		return errors.Wrapf(err, "failed to open LUKS device %q", disk)


Suggested change

return errors.Wrapf(err, "failed to open LUKS device %q", disk)

return errors.Wrapf(err, "failed to open encrypted device %q", disk)

leseb · 2022-11-29T12:53:55Z

pkg/daemon/ceph/osd/volume.go

-				return nil, errors.Errorf("failed to find the encrypted block device for %q, not opened?", block)
-			}
+				// The encrypted block is not opened.
+				// The OSD deployment has been removed manually.


Why remove the mention of the node reboot? If the deployment is removed manually the encrypted block is left open.

Why remove the mention of the node reboot? If the deployment is removed manually the encrypted block is left open.

This is not the only case, the encrypted block is closed when csi provisioned pvc is unmounted due to osd pod deletion too.

This is not clear, please clarify the scenario with CSI. If we have more than the reboot then they should all be listed in the code as comments.

This is not clear, please clarify the scenario with CSI. If we have more than the reboot then they should all be listed in the code as comments.

I dont think we can assume that a encrypted device opened by a pod remains open after pod is deleted.

IMO The case of rook using disks directly on node/ intree driver provisioned pvc leaving the encrypted device open when
the pod rebooted is the corner case. I am thinking we should go ahead with the assumption the encrypted device is no longer open once the pod, be it osd-prepare-job or osd, is deleted.

Does this make sense?

This is not clear, please clarify the scenario with CSI. If we have more than the reboot then they should all be listed in the code as comments.

I dont think we can assume that a encrypted device opened by a pod remains open after pod is deleted.

Why? What would close it then?

IMO The case of rook using disks directly on node/ intree driver provisioned pvc leaving the encrypted device open when the pod rebooted is the corner case. I am thinking we should go ahead with the assumption the encrypted device is no longer open once the pod, be it osd-prepare-job or osd, is deleted.

Even in the PVC case where Rook consumes a PVC for OSD disk, if that PVC is detached the encrypted block is still open and available but broken. We've had cases like this where the underlying block device disappears.

Does this make sense?

Sounds good 👍🏻 I don't mean to argue too much too, if you saw it then I trust you. Also as a reference see this code https://github.com/leseb/rook/blob/61279e98201d5d80090f57433a21e297b2f5efb5/pkg/operator/ceph/cluster/osd/spec.go#L172-L212. We still have cases where the PV block is detached and the encrypted block is still around.

okay 👍
one example is as follows:

This is how it looks on a node with functioning osd

[rakshith@fedora ~]$ oc debug node/ip-10-0-165-109.us-east-2.compute.internal Starting pod/ip-10-0-165-109us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.165.109 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop1 7:1 0 100G 0 loop rbd0 252:0 0 40G 0 disk /var/lib/kubelet/pods/d3adec98-e695-4ba3-a110-ff654c5a5f5c/volumes/kubernetes.io~csi/pvc-a3508c23-c949-43bd-bbfe-e00b52 nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /boot `-nvme0n1p4 259:4 0 119.5G 0 part /sysroot nvme1n1 259:5 0 50G 0 disk /var/lib/kubelet/pods/2e1983d0-25cf-436c-b289-ce7acc6625c4/volumes/kubernetes.io~csi/pvc-0a2fa1bb-6af1-4ca1-a624-12d9f0 nvme2n1 259:6 0 100G 0 disk `-ocs-deviceset-0-data-0cljr5-block-dmcrypt 253:0 0 100G 0 crypt

This is how it looks when osd removal is done

[rakshith@fedora ~]$ oc debug node/ip-10-0-152-142.us-east-2.compute.internal Starting pod/ip-10-0-152-142us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.152.142 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT rbd0 252:0 0 40G 0 disk /host/var/lib/kubelet/pods/31cdabdd-9f4a-4490-b8a2-cd9a9e1f1458/volumes/kubernetes.io~csi/pvc-76c14ea6-8664-4116-b86e-80a nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /host/boot `-nvme0n1p4 259:4 0 119.5G 0 part /host/sysroot nvme1n1 259:5 0 50G 0 disk /host/var/lib/kubelet/pods/46933254-64a2-43a9-a5b5-d4904ccc2025/volumes/kubernetes.io~csi/pvc-95b398fd-3d76-456d-903f-8bc

When osd-prepare job pod starts

sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 100G 0 loop rbd0 252:0 0 40G 0 disk /var/lib/kubelet/pods/31cdabdd-9f4a-4490-b8a2-cd9a9e1f1458/volumes/kubernetes.io~csi/pvc-76c14ea6-8664-4116-b86e-80abcf33 nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /boot `-nvme0n1p4 259:4 0 119.5G 0 part /sysroot nvme1n1 259:5 0 50G 0 disk /var/lib/kubelet/pods/46933254-64a2-43a9-a5b5-d4904ccc2025/volumes/kubernetes.io~csi/pvc-95b398fd-3d76-456d-903f-8bca598f nvme3n2 259:7 0 100G 0 disk sh-4.4# exit

Ok but ocs-deviceset-0-data-0cljr5-block-dmcrypt is copied in /var/lib/ceph/osd/ceph-0/block, this is the file the OSD uses to run and I believe this file is still there.

Ok but ocs-deviceset-0-data-0cljr5-block-dmcrypt is copied in /var/lib/ceph/osd/ceph-0/block, this is the file the OSD uses to run and I believe this file is still there.

that is a file inside the container right?
it goes away with the deletion of osd pod?

No this is HostPath.

leseb · 2022-11-29T12:54:51Z

pkg/daemon/ceph/osd/volume.go

+				target := oposd.EncryptionDMName(pvcName, oposd.DmcryptBlockType)
+				err = openLUKS(context, block, target, passphrase)
+				if err != nil {
+					return nil, errors.Errorf("failed to open encrypted block device %q on %q", block, target)


Use errors.Wrapf() and report err.

leseb

LGTM! Thanks!

travisn · 2022-11-29T23:09:37Z

pkg/daemon/ceph/osd/volume.go

+				logger.Debugf("encrypted block device %q is not open, opening it now", block)
+				passphrase := os.Getenv(oposd.CephVolumeEncryptedKeyEnvVarName)
+				if passphrase == "" {
+					return nil, errors.Errorf("encryption passphrase is empty in env var %q", oposd.CephVolumeEncryptedKeyEnvVarName)


To consider in a future PR, we should write the secret to a file instead of using an env var, as mentioned in #10994.

Maybe we should do that in this PR so that we don't risk forgetting to fix it

That requires changes in quite a lot of places actually.
It is better to do it in a separate future pr.

This commit implements this corner case during osd-prepare job. ``` The encrypted block is not opened, this is an extreme corner case The OSD deployment has been removed manually AND the node rebooted So we need to re-open the block to re-hydrate the OSDInfo. Handling this case would mean, writing the encryption key on a temporary file, then call luksOpen to open the encrypted block and then call ceph-volume to list against the opened encrypted block. We don't implement this, yet and return an error. ``` When underlying PVC for osd are CSI provisioned, the encrypted device is closed when PVC is unmounted due to osd pod being deleted. Therefore, this may occur more frequently and needs to be handled. This commit implements the fix for the same. Signed-off-by: Rakshith R <rar@redhat.com>

osd: re-open encrypted disk during osd-prepare-job if closed (backport #11338)

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch from 84d2183 to ecf050d Compare November 21, 2022 14:05

Rakshith-R changed the title ~~osd: re-open encrypted disk during osd-prepare-job if closed.~~ osd: re-open encrypted disk during osd-prepare-job if closed Nov 21, 2022

Rakshith-R requested review from BlaineEXE, travisn and sp98 November 21, 2022 14:05

Rakshith-R added ceph-osd backport-release-1.10 labels Nov 21, 2022

BlaineEXE requested changes Nov 28, 2022

View reviewed changes

travisn requested a review from leseb November 28, 2022 21:14

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch 4 times, most recently from 92281df to b22605c Compare November 29, 2022 10:39

leseb requested changes Nov 29, 2022

View reviewed changes

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch 4 times, most recently from 3270c13 to 9909db9 Compare November 29, 2022 11:26

Rakshith-R requested a review from leseb November 29, 2022 11:27

leseb requested changes Nov 29, 2022

View reviewed changes

pkg/daemon/ceph/osd/encryption.go Show resolved Hide resolved

Rakshith-R requested a review from leseb November 29, 2022 11:40

leseb requested changes Nov 29, 2022

View reviewed changes

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch 2 times, most recently from 09af2ac to d8d6c66 Compare November 29, 2022 13:06

Rakshith-R requested a review from leseb November 29, 2022 13:06

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch from d8d6c66 to f988d7f Compare November 29, 2022 15:23

Rakshith-R requested a review from BlaineEXE November 29, 2022 15:26

leseb approved these changes Nov 29, 2022

View reviewed changes

travisn reviewed Nov 29, 2022

View reviewed changes

Rakshith-R force-pushed the fix-enc-osd-prepare-job branch from f988d7f to bde286e Compare November 30, 2022 13:24

Rakshith-R requested a review from travisn November 30, 2022 13:29

travisn approved these changes Nov 30, 2022

View reviewed changes

BlaineEXE approved these changes Dec 1, 2022

View reviewed changes

BlaineEXE merged commit 4883a6e into rook:master Dec 1, 2022

mergify bot mentioned this pull request Dec 1, 2022

osd: re-open encrypted disk during osd-prepare-job if closed (backport #11338) #11371

Merged

mergify bot added a commit that referenced this pull request Dec 1, 2022

Merge pull request #11371 from rook/mergify/bp/release-1.10/pr-11338

2e9e3f9

osd: re-open encrypted disk during osd-prepare-job if closed (backport #11338)

Rakshith-R mentioned this pull request Dec 6, 2022

osd: remove stale dm device during osd-prepare-job #11387

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: re-open encrypted disk during osd-prepare-job if closed #11338

osd: re-open encrypted disk during osd-prepare-job if closed #11338

Rakshith-R commented Nov 21, 2022 •

edited

BlaineEXE left a comment

BlaineEXE Nov 28, 2022

Rakshith-R Nov 29, 2022 •

edited

BlaineEXE Nov 29, 2022

BlaineEXE Nov 28, 2022

travisn Nov 28, 2022

Rakshith-R Nov 29, 2022

BlaineEXE Nov 29, 2022 •

edited

BlaineEXE Nov 30, 2022

Rakshith-R Nov 30, 2022

leseb Nov 29, 2022

leseb Nov 29, 2022

Rakshith-R Nov 29, 2022

Rakshith-R commented Nov 29, 2022

leseb left a comment

leseb Nov 29, 2022

leseb Nov 29, 2022

leseb Nov 29, 2022

leseb Nov 29, 2022

Rakshith-R Nov 29, 2022

leseb Nov 29, 2022

Rakshith-R Nov 29, 2022 •

edited

leseb Nov 29, 2022

leseb Nov 29, 2022

Rakshith-R Nov 29, 2022

leseb Nov 29, 2022

Rakshith-R Nov 29, 2022

leseb Nov 29, 2022

leseb Nov 29, 2022

leseb left a comment

travisn Nov 29, 2022

BlaineEXE Nov 29, 2022

Rakshith-R Nov 30, 2022

	func openLUKS(context *clusterd.Context, disk, target, passphrase string) error {
	func openEncryptedDevice(context *clusterd.Context, disk, target, passphrase string) error {

	return errors.Wrapf(err, "failed to open LUKS device %q", disk)
	return errors.Wrapf(err, "failed to open encrypted device %q", disk)

osd: re-open encrypted disk during osd-prepare-job if closed #11338

osd: re-open encrypted disk during osd-prepare-job if closed #11338

Conversation

Rakshith-R commented Nov 21, 2022 • edited

BlaineEXE left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R Nov 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE Nov 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Nov 29, 2022

leseb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R Nov 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leseb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Nov 21, 2022 •

edited

Rakshith-R Nov 29, 2022 •

edited

BlaineEXE Nov 29, 2022 •

edited

Rakshith-R Nov 29, 2022 •

edited