OSD Resize Increases Used Capacity Not Available Capacity #14099

jameshearttech · 2024-04-19T20:02:56Z

** Previous bug report for same issue. Only this time with a different OS, VMware controller, Kubernetes version, Rook version, and Ceph version. Where is the bug?! #12511**

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

After resizing the underlying disk at the hypervisor and OS level resizing the OSD increases cluster total capacity and used capacity.

Expected behavior:

After resizing the underlying disk at the hypervisor and OS level resizing the OSD increases cluster total capacity and available capacity.

How to reproduce it (minimal and precise):

Build a Kubernetes cluster from virtual machines where Rook consumes virtual disks as OSDs. Resize a virtual disk used as an OSD at the hypervisor level. Resize the disk at the OS level. Resize the disk at the Ceph level (e.g., restart the OSD pod).

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary: rook-ceph.yaml.txt

Logs to submit:

Operator's logs, if necessary: Explore-logs-2024-04-19 09_53_04.txt
Crashing pod(s) logs, if necessary: N/A

Cluster Status to submit:

  cluster:
    id:     e1ebb901-75ad-4b7c-90d9-69edf914c04e
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,d (age 2d)
    mgr: b(active, since 90m), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 8 osds: 8 up (since 91m), 8 in (since 2h)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 265 pgs
    objects: 493.10k objects, 748 GiB
    usage:   1.3 TiB used, 1.5 TiB / 2.8 TiB avail
    pgs:     265 active+clean
 
  io:
    client:   103 KiB/s rd, 1.4 MiB/s wr, 22 op/s rd, 14 op/s wr

Environment:

OS (e.g. from /etc/os-release): Talos (v1.6.4)
Kernel (e.g. uname -a): 6.1.74-talos
Cloud provider or hardware configuration: VMware
Rook version (use rook version inside of a Rook Pod): v1.13.3
Storage backend version (e.g. for ceph do ceph -v): v18.2.2
Kubernetes version (use kubectl version): v1.29.1
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Talos
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

The text was updated successfully, but these errors were encountered:

parth-gr · 2024-04-20T14:05:42Z

@jameshearttech can you share the ceph osd df tree outputs before and after resize.

jameshearttech · 2024-04-20T22:25:20Z

For context there is this Slack thread. I see ceph osd df output from when there were 7 K8s nodes and OSDs (i.e., 1 OSD per node). Is the output there good enough? If not I'll have to attempt another resize and get the output at that time. Here is the current output from ceph osd df tree.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
 -1         2.73438         -  2.8 TiB  1.1 TiB  1.0 TiB  493 MiB   16 GiB  1.7 TiB  40.42  1.00    -          root default         
 -5         0.34180         -  350 GiB  121 GiB  119 GiB   31 MiB  1.8 GiB  229 GiB  34.49  0.85    -              host mgmt-worker0
  0    ssd  0.34180   1.00000  350 GiB  121 GiB  119 GiB   31 MiB  1.8 GiB  229 GiB  34.49  0.85   89      up          osd.0        
 -9         0.34180         -  350 GiB  141 GiB  139 GiB   37 MiB  1.7 GiB  209 GiB  40.23  1.00    -              host mgmt-worker1
  2    ssd  0.34180   1.00000  350 GiB  141 GiB  139 GiB   37 MiB  1.7 GiB  209 GiB  40.23  1.00  109      up          osd.2        
 -3         0.34180         -  350 GiB  125 GiB  123 GiB   68 MiB  2.1 GiB  225 GiB  35.76  0.88    -              host mgmt-worker2
  1    ssd  0.34180   1.00000  350 GiB  125 GiB  123 GiB   68 MiB  2.1 GiB  225 GiB  35.76  0.88   92      up          osd.1        
 -7         0.34180         -  350 GiB  134 GiB  133 GiB   80 MiB  1.7 GiB  216 GiB  38.43  0.95    -              host mgmt-worker3
  3    ssd  0.34180   1.00000  350 GiB  134 GiB  133 GiB   80 MiB  1.7 GiB  216 GiB  38.43  0.95  102      up          osd.3        
-11         0.34180         -  350 GiB  132 GiB  129 GiB   72 MiB  2.3 GiB  218 GiB  37.68  0.93    -              host mgmt-worker4
  4    ssd  0.34180   1.00000  350 GiB  132 GiB  129 GiB   72 MiB  2.3 GiB  218 GiB  37.68  0.93  103      up          osd.4        
-13         0.34180         -  350 GiB  147 GiB  145 GiB   63 MiB  1.8 GiB  203 GiB  41.97  1.04    -              host mgmt-worker5
  5    ssd  0.34180   1.00000  350 GiB  147 GiB  145 GiB   63 MiB  1.8 GiB  203 GiB  41.97  1.04  100      up          osd.5        
-15         0.34180         -  350 GiB  137 GiB  134 GiB   64 MiB  2.3 GiB  213 GiB  39.11  0.97    -              host mgmt-worker6
  6    ssd  0.34180   1.00000  350 GiB  137 GiB  134 GiB   64 MiB  2.3 GiB  213 GiB  39.11  0.97  100      up          osd.6        
-17         0.34180         -  450 GiB  235 GiB  133 GiB   78 MiB  2.4 GiB  215 GiB  52.29  1.29    -              host mgmt-worker7
  7    ssd  0.34180   1.00000  450 GiB  235 GiB  133 GiB   78 MiB  2.4 GiB  215 GiB  52.29  1.29  100      up          osd.7        
-19               0         -      0 B      0 B      0 B      0 B      0 B      0 B      0     0    -              host mgmt-worker8
                        TOTAL  2.8 TiB  1.1 TiB  1.0 TiB  493 MiB   16 GiB  1.7 TiB  40.42

parth-gr · 2024-04-22T04:33:22Z

The ceph side values are correct
TOTAL 2.8 TiB 1.1 TiB 1.0 TiB 493 MiB 16 GiB 1.7 TiB 40.42

So it is the dashboard that probably has an error,
@rkachach wanna take a look?

jameshearttech · 2024-04-22T06:18:09Z

I'm not following how you concluded that the Ceph side values are correct from:
TOTAL 2.8 TiB 1.1 TiB 1.0 TiB 493 MiB 16 GiB 1.7 TiB 40.42

The queries in the dashboard are pretty straight forward.
Query A: ceph_cluster_total_bytes{cluster="$cluster"}-ceph_cluster_total_used_bytes{cluster="$cluster"}
Query B: ceph_cluster_total_used_bytes{cluster="$cluster"}
Query C: ceph_cluster_total_bytes{}

The dashboard uses these queries to visualize available, used, and total capacity. How do we conclude that the dashboard is showing incorrect used/available capacity, but the Ceph CLI is showing correct used/available capacity? Is it an issue with the Prometheus metrics from Ceph?

EDIT: I fixed query C: ceph_cluster_total_bytes{cluster="$cluster"}

parth-gr · 2024-04-22T06:29:06Z

From your Screenshots:

                      Available      Used     Total

Previous values:        1.51               1.23        2.73

New Values:             1.51               1.33       2.83

From ceph side          1.7                1.1         2.8
(osd df tree)
(New values)

Which says ceph side values are increased correctly

jameshearttech · 2024-04-22T16:16:24Z

I don't understand. My whole point was that used space increased rather than available space. You seem to be interpreting that differently?

parth-gr · 2024-04-23T15:39:33Z

But if you see the ceph output the values are correct.

From ceph side          1.7                1.1         2.8
(osd df tree)
(New values)

jameshearttech · 2024-04-23T20:15:12Z

At this point I have replaced all the OSDs with slightly larger OSDs. I'm trying to get back down to 4 K8s nodes. I wanted to take another shot at resizing an OSD.

Here is the current screenshot from Grafana.

Here is the current output of ceph osd df tree.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- sh -c "ceph osd df tree"
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
 -1         3.51599         -  3.5 TiB  1.1 TiB  1.1 TiB  418 MiB   13 GiB  2.4 TiB  31.01  1.00    -          root default         
 -5         0.43950         -  450 GiB  125 GiB  124 GiB      0 B  1.5 GiB  325 GiB  27.81  0.90    -              host mgmt-worker0
  0    ssd  0.43950   1.00000  450 GiB  125 GiB  124 GiB      0 B  1.5 GiB  325 GiB  27.81  0.90   89      up          osd.0        
 -9         0.43950         -  450 GiB  145 GiB  144 GiB   29 MiB  1.2 GiB  305 GiB  32.19  1.04    -              host mgmt-worker1
  2    ssd  0.43950   1.00000  450 GiB  145 GiB  144 GiB   29 MiB  1.2 GiB  305 GiB  32.19  1.04  108      up          osd.2        
 -3         0.43950         -  450 GiB  130 GiB  129 GiB   60 MiB  1.5 GiB  320 GiB  28.97  0.93    -              host mgmt-worker2
  1    ssd  0.43950   1.00000  450 GiB  130 GiB  129 GiB   60 MiB  1.5 GiB  320 GiB  28.97  0.93   92      up          osd.1        
 -7         0.43950         -  450 GiB  140 GiB  138 GiB   75 MiB  1.1 GiB  310 GiB  31.02  1.00    -              host mgmt-worker3
  3    ssd  0.43950   1.00000  450 GiB  140 GiB  138 GiB   75 MiB  1.1 GiB  310 GiB  31.02  1.00  102      up          osd.3        
-11         0.43950         -  450 GiB  146 GiB  144 GiB   66 MiB  2.0 GiB  304 GiB  32.44  1.05    -              host mgmt-worker4
  4    ssd  0.43950   1.00000  450 GiB  146 GiB  144 GiB   66 MiB  2.0 GiB  304 GiB  32.44  1.05  107      up          osd.4        
-13         0.43950         -  450 GiB  151 GiB  149 GiB   56 MiB  1.6 GiB  299 GiB  33.58  1.08    -              host mgmt-worker5
  5    ssd  0.43950   1.00000  450 GiB  151 GiB  149 GiB   56 MiB  1.6 GiB  299 GiB  33.58  1.08   99      up          osd.5        
-15         0.43950         -  450 GiB  134 GiB  132 GiB   64 MiB  1.9 GiB  316 GiB  29.69  0.96    -              host mgmt-worker6
  6    ssd  0.43950   1.00000  450 GiB  134 GiB  132 GiB   64 MiB  1.9 GiB  316 GiB  29.69  0.96   96      up          osd.6        
-17         0.43950         -  450 GiB  146 GiB  144 GiB   67 MiB  2.1 GiB  304 GiB  32.43  1.05    -              host mgmt-worker7
  7    ssd  0.43950   1.00000  450 GiB  146 GiB  144 GiB   67 MiB  2.1 GiB  304 GiB  32.43  1.05  102      up          osd.7        
                        TOTAL  3.5 TiB  1.1 TiB  1.1 TiB  418 MiB   13 GiB  2.4 TiB  31.01                                          
MIN/MAX VAR: 0.90/1.08  STDDEV: 1.88

I drained then shutdown mgmt-worker0. I resized the virtual disk for /dev/sdb from 450 GB to 850 GB, which is consumed by Rook as osd.0. I started mgmt-worker0 then uncordoned the node. I waited for Ceph to rebalance then checked the result.

Here is the post osd.0 resize screenshot from Grafana.

Here is the post osd.0 resize output of ceph osd df tree.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- sh -c "ceph osd df tree"
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
 -1         3.51599         -  3.9 TiB  1.5 TiB  1.1 TiB  440 MiB   12 GiB  2.4 TiB  37.90  1.00    -          root default         
 -5         0.43950         -  850 GiB  546 GiB  146 GiB   22 MiB  723 MiB  304 GiB  64.27  1.70    -              host mgmt-worker0
  0    ssd  0.43950   1.00000  850 GiB  546 GiB  146 GiB   22 MiB  723 MiB  304 GiB  64.27  1.70   99      up          osd.0        
 -9         0.43950         -  450 GiB  141 GiB  139 GiB   29 MiB  1.2 GiB  309 GiB  31.24  0.82    -              host mgmt-worker1
  2    ssd  0.43950   1.00000  450 GiB  141 GiB  139 GiB   29 MiB  1.2 GiB  309 GiB  31.24  0.82  106      up          osd.2        
 -3         0.43950         -  450 GiB  130 GiB  129 GiB   60 MiB  1.5 GiB  320 GiB  28.97  0.76    -              host mgmt-worker2
  1    ssd  0.43950   1.00000  450 GiB  130 GiB  129 GiB   60 MiB  1.5 GiB  320 GiB  28.97  0.76   92      up          osd.1        
 -7         0.43950         -  450 GiB  140 GiB  138 GiB   75 MiB  1.1 GiB  310 GiB  31.03  0.82    -              host mgmt-worker3
  3    ssd  0.43950   1.00000  450 GiB  140 GiB  138 GiB   75 MiB  1.1 GiB  310 GiB  31.03  0.82  102      up          osd.3        
-11         0.43950         -  450 GiB  146 GiB  144 GiB   66 MiB  2.0 GiB  304 GiB  32.44  0.86    -              host mgmt-worker4
  4    ssd  0.43950   1.00000  450 GiB  146 GiB  144 GiB   66 MiB  2.0 GiB  304 GiB  32.44  0.86  107      up          osd.4        
-13         0.43950         -  450 GiB  140 GiB  138 GiB   56 MiB  1.6 GiB  310 GiB  31.13  0.82    -              host mgmt-worker5
  5    ssd  0.43950   1.00000  450 GiB  140 GiB  138 GiB   56 MiB  1.6 GiB  310 GiB  31.13  0.82   94      up          osd.5        
-15         0.43950         -  450 GiB  134 GiB  132 GiB   64 MiB  1.9 GiB  316 GiB  29.69  0.78    -              host mgmt-worker6
  6    ssd  0.43950   1.00000  450 GiB  134 GiB  132 GiB   64 MiB  1.9 GiB  316 GiB  29.69  0.78   96      up          osd.6        
-17         0.43950         -  450 GiB  139 GiB  137 GiB   67 MiB  2.1 GiB  311 GiB  30.97  0.82    -              host mgmt-worker7
  7    ssd  0.43950   1.00000  450 GiB  139 GiB  137 GiB   67 MiB  2.1 GiB  311 GiB  30.97  0.82   99      up          osd.7        
                        TOTAL  3.9 TiB  1.5 TiB  1.1 TiB  440 MiB   12 GiB  2.4 TiB  37.90                                          
MIN/MAX VAR: 0.76/1.70  STDDEV: 11.50

Here is a post osd.0 resize screenshot from Ceph dashboard.

In all 3 cases the used capacity appears to have increased by 400 GB, which is how much I increased the size of the virtual disk underlying osd.0. Looking a bit closer I see the use remained at 1.1 TB while the raw use increased to 1.5 TB. Does this get us closer to an answer?

Looking back through my previous issue and at some other similar issues the expand-bluefs container is supposed to solve this problem of used capacity vs available capacity; however, in my case it does not work as expected I guess, but why? I can see the expand-bluefs container is running from the logs.

Explore-logs-2024-04-23 10_42_46.txt

jameshearttech · 2024-04-23T20:24:51Z

Looking at the work I have been doing over the course of the day you can see that when I replace an OSD the total and available capacity move together, but when I resize they diverge. Or maybe I should say the use and raw use move together?

travisn · 2024-04-23T22:00:00Z

I spun up an AWS test cluster and confirmed that I see the same behavior...

The initial cluster has three OSDs:

sh-4.4$ ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP  META    AVAIL   %USE  VAR   PGS  STATUS
 1    ssd  0.00980   1.00000  10 GiB   27 MiB  724 KiB   0 B  26 MiB  10 GiB  0.26  1.00    1      up
 2    ssd  0.00980   1.00000  10 GiB   27 MiB  724 KiB   0 B  26 MiB  10 GiB  0.26  1.00    1      up
 0    ssd  0.00980   1.00000  10 GiB   27 MiB  720 KiB   0 B  26 MiB  10 GiB  0.26  1.00    1      up
                       TOTAL  30 GiB   81 MiB  2.1 MiB   0 B  79 MiB  30 GiB  0.26

After resizing, the OSD config is:

sh-4.4$ ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP     META    AVAIL   %USE   VAR   PGS  STATUS
 1    ssd  0.00980   1.00000  15 GiB  5.0 GiB  976 KiB    1 KiB  27 MiB  10 GiB  33.51  1.00    1      up
 2    ssd  0.00980   1.00000  15 GiB  5.0 GiB  976 KiB    1 KiB  27 MiB  10 GiB  33.51  1.00    1      up
 0    ssd  0.00980   1.00000  15 GiB  5.0 GiB  972 KiB    1 KiB  27 MiB  10 GiB  33.51  1.00    1      up
                       TOTAL  45 GiB   15 GiB  2.9 MiB  3.5 KiB  80 MiB  30 GiB  33.51

This would be a Ceph issue. Rook doesn't have any influence on the sizes the OSDs that are reported, and I don't see how the OSD resize could influence that incorrect raw size. Would you mind opening a Ceph tracker for this?

rkachach · 2024-04-24T07:53:22Z

@nizamial09 is this a knows issue?

jameshearttech · 2024-04-24T19:31:39Z

https://tracker.ceph.com/issues/65659

jameshearttech · 2024-04-25T20:47:05Z

These may be related.
https://tracker.ceph.com/issues/62689
https://tracker.ceph.com/issues/63858
https://tracker.ceph.com/issues/64092
ceph/ceph#55776

jameshearttech · 2024-04-26T16:51:56Z

I got a response on Ceph issue 65659 stating it is probably the same issue as Ceph issue 63858. The suggested work around is Ceph issue 63858 note 7.

I'm not sure how to apply this work around in Rook. I drained the node, marked the OSD out, rebooted the node, marked the OSD in, uncordoned the node, and waited for rebalance to complete. The used space did not change (i.e., go down by 400 GB). @travisn I'm happy to test this and confirm the work around, but I'm not sure how. Any ideas?

jameshearttech · 2024-04-27T01:24:24Z

[From Igor Fedotov](https://tracker.ceph.com/issues/65659#note-11)

Generally what you need is to shutdown OSD process in a non-graceful manner. And let it rebuild allocmap during the following restart. It has nothing about osd draining or node restart (unless you power it off which I'd prefer not to do).

In bare metal setup this implies running kill -9 against ceph-osd process. You need to achieve the same in Rook environment. Sorry, I'm not an expert in it hence unable to provide more detailed guideline...

jameshearttech · 2024-04-27T02:00:41Z

Following Igor's suggestion worked as expected.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/osd-3/ {print $1}') -n rook-ceph -c osd -it -- sh
sh-4.4# ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
65535         1  0.0  0.0    996     4 ?        Ss   Apr26   0:00 /pause
ceph        465  1.2  6.6 2665832 1644016 ?     Ssl  Apr26   6:53 ceph-osd --foreground --id 3 --fsid e1ebb901-75ad-4b7c-90d9-69edf914c04e --setuser ceph --setgroup ceph --crush-location=root=default host=mgmt-worker3 --default-log-to-stderr=true --default-err-
root        471  0.0  0.0  14096  2984 pts/0    Ss   Apr26   0:00 /bin/bash -x -e -m -c  CEPH_CLIENT_ID=ceph-osd.3 PERIODICITY=daily LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph LOG_MAX_SIZE=500M ROTATE=7  # edit the logrotate file to only rotate a specific daemo
root      29350  0.0  0.0  23144  1524 pts/0    S+   01:36   0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 15m
root      29695  0.0  0.0  14228  3328 pts/0    Ss   01:42   0:00 sh
root      29757  0.0  0.0  49828  3708 pts/0    R+   01:43   0:00 ps aux
sh-4.4# kill -9 465
sh-4.4# command terminated with exit code 137

RAW USE is now the same size as DATA for OSD3.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd df tree
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
-1         2.92978         -  3.3 TiB  1.0 TiB  1.0 TiB  391 MiB  7.6 GiB  2.3 TiB  31.30  1.00    -          root default         
-5         0.83009         -  850 GiB  290 GiB  289 GiB   63 MiB  1.9 GiB  560 GiB  34.17  1.09    -              host mgmt-worker0
 0    ssd  0.83009   1.00000  850 GiB  290 GiB  289 GiB   63 MiB  1.9 GiB  560 GiB  34.17  1.09  128      up          osd.0        
-9         0.83009         -  850 GiB  289 GiB  286 GiB  132 MiB  2.7 GiB  561 GiB  34.03  1.09    -              host mgmt-worker1
 1    ssd  0.83009   1.00000  850 GiB  289 GiB  286 GiB  132 MiB  2.7 GiB  561 GiB  34.03  1.09  146      up          osd.1        
-3         0.83009         -  850 GiB  294 GiB  292 GiB   83 MiB  2.2 GiB  556 GiB  34.59  1.11    -              host mgmt-worker2
 2    ssd  0.83009   1.00000  850 GiB  294 GiB  292 GiB   83 MiB  2.2 GiB  556 GiB  34.59  1.11  136      up          osd.2        
-7         0.43950         -  850 GiB  190 GiB  189 GiB  113 MiB  880 MiB  660 GiB  22.41  0.72    -              host mgmt-worker3
 3    ssd  0.43950   1.00000  850 GiB  190 GiB  189 GiB  113 MiB  880 MiB  660 GiB  22.41  0.72   97      up          osd.3        
                       TOTAL  3.3 TiB  1.0 TiB  1.0 TiB  391 MiB  7.6 GiB  2.3 TiB  31.30                                          
MIN/MAX VAR: 0.72/1.11  STDDEV: 5.14

The numbers still look off to me. Why is OSD3 smaller and has less PGs than the other 3?

travisn · 2024-04-29T18:07:39Z

From the linked ceph tracker, since ceph/ceph#55777 was merged to reef this is expected to be fixed in v18.2.3.

Good to see that the workaround with kill -9 in the pod fixed it.

The weight of osd.3 looks about half the size of the other OSDs, which would explain why the PGs are not balanced. But since the raw size is the same for all the OSDs, I'm not sure why the weight would be smaller. It seems the weight is the same as before the resize? Try ceph osd crush reweight to adjust it.

jameshearttech · 2024-04-29T18:27:41Z

It seems the weight is the same as before the resize?

Yeah, that seems like a reasonable guess.

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd crush reweight
Invalid command: missing required parameter name(<string(goodchars [A-Za-z0-9-_.])>)
osd crush reweight <name> <weight:float> :  change <name>'s weight to <weight> in crush map
Error EINVAL: invalid command
command terminated with exit code 22

Not sure how this is supposed to work. Do I manually specify the new weight? For example, if I want it to be the same the other OSDs then specify 0.83009? However, if I do that then the OSDs do not sum to 2.92978. Should I reweight all the OSDs to 2.92978/4=0.732445? Why is it not doing this automatically such as when a new OSD is created?

jameshearttech · 2024-04-29T18:33:31Z

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd crush reweight osd.3 0.83009
reweighted item id 3 name 'osd.3' to 0.83009 in crush map
$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd df tree
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
-1         3.32036         -  3.3 TiB  1.1 TiB  1.1 TiB  411 MiB   10 GiB  2.3 TiB  32.17  1.00    -          root default         
-5         0.83009         -  850 GiB  299 GiB  296 GiB   70 MiB  2.3 GiB  551 GiB  35.12  1.09    -              host mgmt-worker0
 0    ssd  0.83009   1.00000  850 GiB  299 GiB  296 GiB   70 MiB  2.3 GiB  551 GiB  35.12  1.09  127      up          osd.0        
-9         0.83009         -  850 GiB  297 GiB  294 GiB  130 MiB  3.0 GiB  553 GiB  34.95  1.09    -              host mgmt-worker1
 1    ssd  0.83009   1.00000  850 GiB  297 GiB  294 GiB  130 MiB  3.0 GiB  553 GiB  34.95  1.09  144      up          osd.1        
-3         0.83009         -  850 GiB  302 GiB  299 GiB   88 MiB  2.6 GiB  548 GiB  35.52  1.10    -              host mgmt-worker2
 2    ssd  0.83009   1.00000  850 GiB  302 GiB  299 GiB   88 MiB  2.6 GiB  548 GiB  35.52  1.10  133      up          osd.2        
-7         0.83008         -  850 GiB  196 GiB  194 GiB  123 MiB  2.1 GiB  654 GiB  23.10  0.72    -              host mgmt-worker3
 3    ssd  0.83008   1.00000  850 GiB  196 GiB  194 GiB  123 MiB  2.1 GiB  654 GiB  23.10  0.72  103      up          osd.3        
                       TOTAL  3.3 TiB  1.1 TiB  1.1 TiB  411 MiB   10 GiB  2.3 TiB  32.17

jameshearttech · 2024-04-29T19:27:08Z

I just went for it. Looks like it worked? Is it a coincidence that the SIZE and WEIGHT are approximately the same? I noticed that osd.3 is ever so slightly smaller than the others at a weight of 0.83008 vs 0.83009. 3.3/4=8.25 so maybe I should set them all to 0.82500?

$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd crush reweight osd.3 0.83009
reweighted item id 3 name 'osd.3' to 0.83009 in crush map
$ kubectl exec $(kubectl get pod -n rook-ceph | awk '/tool/ {print $1}') -n rook-ceph -- ceph osd df tree
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME            
-1         3.32036         -  3.3 TiB  1.1 TiB  1.1 TiB  415 MiB  9.0 GiB  2.3 TiB  32.16  1.00    -          root default         
-5         0.83009         -  850 GiB  276 GiB  274 GiB   70 MiB  2.6 GiB  574 GiB  32.51  1.01    -              host mgmt-worker0
 0    ssd  0.83009   1.00000  850 GiB  276 GiB  274 GiB   70 MiB  2.6 GiB  574 GiB  32.51  1.01  122      up          osd.0        
-9         0.83009         -  850 GiB  279 GiB  277 GiB  130 MiB  1.9 GiB  571 GiB  32.86  1.02    -              host mgmt-worker1
 1    ssd  0.83009   1.00000  850 GiB  279 GiB  277 GiB  130 MiB  1.9 GiB  571 GiB  32.86  1.02  134      up          osd.1        
-3         0.83009         -  850 GiB  282 GiB  279 GiB   88 MiB  2.9 GiB  568 GiB  33.19  1.03    -              host mgmt-worker2
 2    ssd  0.83009   1.00000  850 GiB  282 GiB  279 GiB   88 MiB  2.9 GiB  568 GiB  33.19  1.03  125      up          osd.2        
-7         0.83008         -  850 GiB  256 GiB  254 GiB  127 MiB  1.7 GiB  594 GiB  30.08  0.94    -              host mgmt-worker3
 3    ssd  0.83008   1.00000  850 GiB  256 GiB  254 GiB  127 MiB  1.7 GiB  594 GiB  30.08  0.94  126      up          osd.3        
                       TOTAL  3.3 TiB  1.1 TiB  1.1 TiB  415 MiB  9.0 GiB  2.3 TiB  32.16                                          
MIN/MAX VAR: 0.94/1.03  STDDEV: 1.22

travisn · 2024-04-29T19:47:31Z

By default, the weight of the OSD is based on the size, so this sounds expected. If it's off by such a small amount it shouldn't impact the PG placement enough to worry about. PGs will not be exactly perfectly distributed across OSDs anyway since it's based on hashing.

jameshearttech · 2024-04-29T21:44:55Z

@travisn really appreciate your help. I'm closing this one out.

jameshearttech added the bug label Apr 19, 2024

jameshearttech closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSD Resize Increases Used Capacity Not Available Capacity #14099

OSD Resize Increases Used Capacity Not Available Capacity #14099

jameshearttech commented Apr 19, 2024 •

edited

parth-gr commented Apr 20, 2024

jameshearttech commented Apr 20, 2024

parth-gr commented Apr 22, 2024 •

edited

jameshearttech commented Apr 22, 2024 •

edited

parth-gr commented Apr 22, 2024 •

edited

jameshearttech commented Apr 22, 2024

parth-gr commented Apr 23, 2024

jameshearttech commented Apr 23, 2024 •

edited

jameshearttech commented Apr 23, 2024 •

edited

travisn commented Apr 23, 2024

rkachach commented Apr 24, 2024 •

edited

jameshearttech commented Apr 24, 2024 •

edited

jameshearttech commented Apr 25, 2024 •

edited

jameshearttech commented Apr 26, 2024 •

edited

jameshearttech commented Apr 27, 2024

jameshearttech commented Apr 27, 2024 •

edited

travisn commented Apr 29, 2024

jameshearttech commented Apr 29, 2024 •

edited

jameshearttech commented Apr 29, 2024

jameshearttech commented Apr 29, 2024 •

edited

travisn commented Apr 29, 2024

jameshearttech commented Apr 29, 2024

OSD Resize Increases Used Capacity Not Available Capacity #14099

OSD Resize Increases Used Capacity Not Available Capacity #14099

Comments

jameshearttech commented Apr 19, 2024 • edited

parth-gr commented Apr 20, 2024

jameshearttech commented Apr 20, 2024

parth-gr commented Apr 22, 2024 • edited

jameshearttech commented Apr 22, 2024 • edited

parth-gr commented Apr 22, 2024 • edited

jameshearttech commented Apr 22, 2024

parth-gr commented Apr 23, 2024

jameshearttech commented Apr 23, 2024 • edited

jameshearttech commented Apr 23, 2024 • edited

travisn commented Apr 23, 2024

rkachach commented Apr 24, 2024 • edited

jameshearttech commented Apr 24, 2024 • edited

jameshearttech commented Apr 25, 2024 • edited

jameshearttech commented Apr 26, 2024 • edited

jameshearttech commented Apr 27, 2024

jameshearttech commented Apr 27, 2024 • edited

travisn commented Apr 29, 2024

jameshearttech commented Apr 29, 2024 • edited

jameshearttech commented Apr 29, 2024

jameshearttech commented Apr 29, 2024 • edited

travisn commented Apr 29, 2024

jameshearttech commented Apr 29, 2024

jameshearttech commented Apr 19, 2024 •

edited

parth-gr commented Apr 22, 2024 •

edited

jameshearttech commented Apr 22, 2024 •

edited

parth-gr commented Apr 22, 2024 •

edited

jameshearttech commented Apr 23, 2024 •

edited

jameshearttech commented Apr 23, 2024 •

edited

rkachach commented Apr 24, 2024 •

edited

jameshearttech commented Apr 24, 2024 •

edited

jameshearttech commented Apr 25, 2024 •

edited

jameshearttech commented Apr 26, 2024 •

edited

jameshearttech commented Apr 27, 2024 •

edited

jameshearttech commented Apr 29, 2024 •

edited

jameshearttech commented Apr 29, 2024 •

edited