If you for some reason have an osd failing to start, a solution can be to re-create it. This is a destructive process, but if all placement-groups are active+clean after the disk have failed the data on the disk is re-created elsewhere, and it is thus safe to format it. The re-creation of an OSD is a multi-step approach, and this article guides you through the process.
Stop the OSD process, and destroy the OSD
First identify the OSD which does not work. This is typically done with the command ceph osd tree. In this example it is osd 32 on storage302 which has failed:
eigil@cephmon1:~$ sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 289.21906 root default ... -129 12.72873 host storage302 ... 30 evo 0.90919 osd.30 up 1.00000 1.00000 31 evo 0.90919 osd.31 up 1.00000 1.00000 32 evo 0.90919 osd.32 down 0 1.00000 33 evo 0.90919 osd.33 up 1.00000 1.00000 34 evo 0.90919 osd.34 up 1.00000 1.00000 ...
Head over to the storage-machine and determine which disk/partition/LV is serving this OSD:
root@storage302:~# ceph-volume lvm list ... ====== osd.32 ====== [block] /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f block device /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f block uuid xjLRr5-8bY2-UHlc-k7Av-SKKR-bRfg-shWPh4 cephx lockbox secret cluster fsid 859f7b25-cb7a-4043-be85-58c10edf9195 cluster name ceph crush device class None encrypted 0 osd fsid 7c92f533-b3ac-4d33-a235-4c826448e920 osd id 32 type block vdo 0 devices /dev/sdg ...
From the output of the commands above we can note down a couple of crucial bits of information:
- OSD ID: 32
- OSD LVM VG: ceph-1c5ab093-08fa-4304-840e-7a2257156966
- OSD LVM LV: osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f
- physical device: /dev/sdg
Now we can make sure the OSD process is stopped, and then delete it:
root@storage302:~# systemctl stop ceph-osd@32.service root@storage302:~# ceph-volume lvm zap ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f --destroy
Remove the OSD from the cluster
Head over to a ceph-mon and delete the OSD from the cluster. This will trigger some rebalancing.
root@storage302:~# ceph osd out osd.32 root@storage302:~# ceph osd crush remove osd.32 root@storage302:~# ceph auth del osd.32 root@storage302:~# ceph osd rm osd.32
Prepare a new OSD using the same disk
Now we can re-create the OSD using the same disk with the same OSD ID. As the OSD used a lv we also need to re-create that:
root@storage302:~# lvcreate -n osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f -l 100%FREE ceph-1c5ab093-08fa-4304-840e-7a2257156966 root@storage302:~# ceph-volume lvm prepare --bluestore --data /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f --osd-id 32
It might be smart to also set the osd crush class when preparing it, using the option "--crush-device-class foo", as ceph automaticly picks hdd or ssd if that is omitted. The class can however be changed after the OSD has beed commissioned.
Activate the new OSD
The final step is to activate the OSD. To do that you need the new OSDs UUID:
root@storage302:~# cat /var/lib/ceph/osd/ceph-32/fsid root@storage302:~# ceph-volume lvm activate --bluestore 32 4b90555c-8dd7-46e1-a46a-355557211058