If you for some reason have an osd failing to start, a solution can be to re-create it. This is a destructive process, but if all placement-groups are active+clean after the disk have failed the data on the disk is re-created elsewhere, and it is thus safe to format it. The re-creation of an OSD is a multi-step approach, and this article guides you through the process.

Stop the OSD process, and destroy the OSD

First identify the OSD which does not work. This is typically done with the command ceph osd tree. In this example it is osd 32 on storage302 which has failed:

Determine the failed OSD
eigil@cephmon1:~$ sudo ceph osd tree
ID    CLASS  WEIGHT     TYPE NAME              STATUS  REWEIGHT  PRI-AFF
  -1         289.21906  root default                                    
 ...
-129          12.72873      host storage302                             
 ...
  30    evo    0.90919          osd.30             up   1.00000  1.00000
  31    evo    0.90919          osd.31             up   1.00000  1.00000
  32    evo    0.90919          osd.32           down         0  1.00000
  33    evo    0.90919          osd.33             up   1.00000  1.00000
  34    evo    0.90919          osd.34             up   1.00000  1.00000
 ...

Head over to the storage-machine and determine which disk/partition/LV is serving this OSD:

Determine the blockdevice of the failed OSD
root@storage302:~# ceph-volume lvm list 
 ...
====== osd.32 ======

  [block]       /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f

      block device              /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f
      block uuid                xjLRr5-8bY2-UHlc-k7Av-SKKR-bRfg-shWPh4
      cephx lockbox secret      
      cluster fsid              859f7b25-cb7a-4043-be85-58c10edf9195
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  7c92f533-b3ac-4d33-a235-4c826448e920
      osd id                    32
      type                      block
      vdo                       0
      devices                   /dev/sdg
 ...

From the output of the commands above we can note down a couple of crucial bits of information:

  • OSD ID: 32
  • OSD LVM VG: ceph-1c5ab093-08fa-4304-840e-7a2257156966
  • OSD LVM LV: osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f
  • physical device: /dev/sdg

Now we can make sure the OSD process is stopped, and then delete it:

Delete the failed OSD
root@storage302:~# systemctl stop ceph-osd@32.service
root@storage302:~# ceph-volume lvm zap ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f --destroy 

Remove the OSD from the cluster

Head over to a ceph-mon and delete the OSD from the cluster. This will trigger some rebalancing.

Delete the failed OSD
root@storage302:~# ceph osd out osd.32
root@storage302:~# ceph osd crush remove osd.32
root@storage302:~# ceph auth del osd.32
root@storage302:~# ceph osd rm osd.32

Prepare a new OSD using the same disk

Now we can re-create the OSD using the same disk with the same OSD ID. As the OSD used a lv we also need to re-create that:

Prepare a OSD with the same blockdevice
root@storage302:~# lvcreate -n osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f -l 100%FREE ceph-1c5ab093-08fa-4304-840e-7a2257156966
root@storage302:~# ceph-volume lvm prepare --bluestore --data /dev/ceph-1c5ab093-08fa-4304-840e-7a2257156966/osd-data-24708c3e-9caf-4fc2-8b1e-de54a0ca527f --osd-id 32

It might be smart to also set the osd crush class when preparing it, using the option "--crush-device-class foo", as ceph automaticly picks hdd or ssd if that is omitted. The class can however be changed after the OSD has beed commissioned.

Activate the new OSD

The final step is to activate the OSD. To do that you need the new OSDs UUID:

Activate the new OSD
root@storage302:~# cat /var/lib/ceph/osd/ceph-32/fsid 
root@storage302:~# ceph-volume lvm activate --bluestore 32 4b90555c-8dd7-46e1-a46a-355557211058
  • No labels