Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

ygao-armada · 2024-04-26T16:53:58Z

Is this a bug report or feature request? Bug report

Bug Report
I try to run following commands:

git clone --single-branch --branch v1.13.6 https://github.com/rook/rook.git
cd rook/deploy/examples
uncomment line: " # encryptedDevice: "true" ..."
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml

I failed with 2 tries:

create partition for each data disk.
no partition for each data disk

The former has RUNNING rook-ceph-osd-prepare-xxx pods, and such rook-ceph-osd-prepare logs:

2024-04-26 09:15:00.537540 D | exec: Running command: lsblk --noheadings --path --list --output NAME /dev/sda
2024-04-26 09:15:00.538671 I | inventory: skipping device "sda" because it has child, considering the child instead.
...
2024-04-26 09:15:00.602857 D | exec: Running command: ceph-volume inventory --format json /dev/sda1
2024-04-26 09:15:00.866116 I | cephosd: device "sda1" is available.
2024-04-26 09:15:00.866129 I | cephosd: partition "sda1" is not picked because encrypted OSD on partition is not allowed

The latter has CrashLoopBackOff rook-ceph-osd-prepare-xxx pods, and such rook-ceph-osd-prepare logs:

2024-04-26 15:41:31.128026 I | cephosd: device "sda" is available.
2024-04-26 15:41:31.128043 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
...
2024-04-26 15:41:31.393242 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2024-04-26 15:41:31.393254 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json
2024-04-26 15:41:31.776134 D | cephosd: won't use raw mode since encryption is enabled
2024-04-26 15:41:31.776150 D | exec: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt -- /usr/sbin/lvm --help
2024-04-26 15:41:31.776892 D | cephosd: failed to call nsenter. failed to execute nsenter. output: nsenter: failed to execute /usr/sbin/lvm: No such file or directory: exit status 127

Then I try to copy lvm to /usr/sbin/lvm, I get this logs:

Traceback (most recent call last):
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
  return f(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
  terminal.dispatch(self.mapper, subcommand_args)
 File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
  instance.main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
  terminal.dispatch(self.mapper, self.argv)
 File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
  instance.main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
  return func(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 414, in main
  self._execute(plan)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 429, in _execute
  p.safe_prepare(argparse.Namespace(**args))
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 200, in safe_prepare
  rollback_osd(self.args, self.osd_id)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
  Zap(['--destroy', '--osd-id', osd_id]).main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 403, in main
  self.zap_osd()
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
  return func(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
  devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
  '%s' % osd_id or osd_fsid)
RuntimeError: Unable to find any LV for zapping OSD: 0
2024-04-26 16:23:19.076114 C | rookcmd: failed to configure devices: failed to initialize osd: failed ceph-volume: exit status 1

Deviation from expected behavior:
no osd created
Expected behavior:
osd created properly

How to reproduce it (minimal and precise):

Just run above commands with EKS anywhere bare metal cluster with ubuntu 20.04 (to be honest, I'm afraid it's a general issue)

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary
available upon request

Logs to submit:
mentioned above

Operator's logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read GitHub documentation if you need help.

Cluster Status to submit:

Output of kubectl commands, if necessary

To get the health of the cluster, use kubectl rook-ceph health
To get the status of the cluster, use kubectl rook-ceph ceph status
For more details, see the Rook kubectl Plugin

Environment:

OS (e.g. from /etc/os-release): ubuntu 20.04
Kernel (e.g. uname -a): 5.4.0-177-generic
Cloud provider or hardware configuration: Dell Power Edge R650
Rook version (use rook version inside of a Rook Pod): v1.13.6
Storage backend version (e.g. for ceph do ceph -v):
Kubernetes version (use kubectl version):
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): EKSA
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

The text was updated successfully, but these errors were encountered:

ygao-armada · 2024-04-27T00:09:53Z

After I install lvm2 in the osImage, the osd pods are created successfully:

$ kubectl -n rook-ceph get pod
...
rook-ceph-osd-0-556d6d75f9-l6pbz                                 2/2     Running     0          9m14s
rook-ceph-osd-1-59c4c76ccc-6wwpv                                 2/2     Running     0          8m45s
rook-ceph-osd-2-54dddf59bf-r69m8                                 2/2     Running     0          7m58s
rook-ceph-osd-3-696d9dd87b-5wh4w                                 2/2     Running     0          7m58s

In the node, we can see:

# lsblk -f
NAME                                FSTYPE LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
...
sdd                                 LVM2_m       QSU1CB-Vxkn-jXah-Rufx-MhiB-IWKu-6z8sN0                
└─ceph--bad12e0b--fe26--44e9--897a--10cfe0ac0d50-osd--block--5268c673--8ffb--4a19--ac9a--c8a49e96a2e2
                                                                                                       
  └─4ttf0w-cSmK-XB06-Vgum-kWoE-MtTk-lsr1e9
                                                                                                       
sde                                 LVM2_m       gMrIC4-EL9a-ON47-UIFz-7uDd-RY8U-2bwbxX                
└─ceph--3e3400bb--073b--43a6--9759--0735fb4bf8fd-osd--block--891a74d0--ba9c--4eb3--8a39--8eff473015ec
                                                                                                       
  └─ebxL39-JSUq-NZYY-NBEC-GuSM-Ai1e-V044mU
                                                                                                       
sdf                                 LVM2_m       GnUl5A-LI89-qgBB-t2sx-Pj3w-bIxb-UatsxQ                
└─ceph--8ee5f47b--695c--4dda--9dab--1e0b16579f62-osd--block--dac48c85--c1f6--494d--b74b--fae0919810eb
                                                                                                       
  └─K9vETz-8Afu-rvuL-3YZd-1rd0-aQg5-MjeTwS
...

ygao-armada added the bug label Apr 26, 2024

ygao-armada changed the title ~~Failed to enable disk encryption in the storage~~ Failed to enable disk encryption in the storage on EKS anywhere nodes Apr 26, 2024

ygao-armada changed the title ~~Failed to enable disk encryption in the storage on EKS anywhere nodes~~ Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes Apr 26, 2024

ygao-armada closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

ygao-armada commented Apr 26, 2024 •

edited

ygao-armada commented Apr 27, 2024 •

edited

Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

Comments

ygao-armada commented Apr 26, 2024 • edited

ygao-armada commented Apr 27, 2024 • edited

ygao-armada commented Apr 26, 2024 •

edited

ygao-armada commented Apr 27, 2024 •

edited