Installation/de-commisioning

Installing a new ceph monitor

To install a new ceph monitor. Execute the following steps:

  1. Install the role role::ceph::mon on the new ceph-mon.
  2. Add the new ceph-mon to the profile::ceph::monitors in hiera

Removing a ceph monitor

  1. Remove the node from profile::ceph::monitors in hiera
  2. Stop puppet on the node you want to remove
  3. Stop ceph-mon and ceph-mgr
  4. On another ceph-mon, run ceph mon remove <mon-name>

Reinstalling a storage node

If a storage-node is reinstalled, either because it needs a newer OS or because the node moves from old to new infrastructure, there is no need to start with its OSD fresh. The OSD's can be reinstalled into the cluster if they have not been reformatted with the following steps:

  1. Run "ceph-volume lvm list" to verify thall all OSDs are recognized by ceph
  2. Restart all the osd by running "ceph-volume lvm activate --all"

Add a storage node

  1. Ensure that all SSDs are listed in profile::disk::ssds in the node-specific hiera
  2. Install the role role::storage on the new node
  3. Create OSDs, typically 2 per device on a 2TB Drive. Details below
# List available disks
ceph-volume inventory

# Dell tends to install EFI stuff on the first disk. Check if there is any partitions on /dev/sdb. If it is, run
ceph-volume lvm zap /dev/sdb

# Create 2 OSDs on each disk you intend to add
ceph-volume lvm batch  --osds-per-device 2  /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

# Restart the services
systemctl restart ceph.target


Storage management

HDDs vs SSDs in hybrid clusters

One of our clusters (ceph@stack.it) are hybrid-clusters where some of the OSDs are HDDs and some are SSDs. In this cluster we tune crush to ensure that some pools are placed on SSDs while others are placed on HDDs.  The tuning is done in three distinctive steps:

  1. Ensure that OSD's are classified correctly
  2. Create crush-maps for each osd-class
  3. Set the desired crush-map on a certain pool

Classify OSD's

Ceph tries to classify OSDs as SSD or HDD.Classes can be seen by using the command:

root@cephmon1:~# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
   ....
  3   hdd 0.90959         osd.3          up  1.00000 1.00000 
   ....

It is unfortunatley not always able to classify them correctly, and in that case manual change is needed. To change the class of an OSD we need to remove the old class, and add a new one:

root@cephmon1:~# ceph osd crush rm-device-class osd.3
root@cephmon1:~# ceph osd crush set-device-class ssd osd.3

Create new CRUSH-maps

Ceph provides sensible macros for creating crush-maps restricting the pool to a certain device-class. So, to create two CRUSH-maps, one for HDD and one for SSD can be done like so:

root@cephmon1:~# ceph osd crush rule create-replicated hdd-only default host hdd
root@cephmon1:~# ceph osd crush rule create-replicated ssd-only default host ssd

Assign CRUSH-maps to pools.

To see which CRUSH-maps are available, use the following command:

root@cephmon1:~# ceph osd crush rule ls
replicated_ruleset
hdd-only
ssd-only

To display which pools are assigned to which CRUSH-maps, the following command can be helpful:

root@cephmon1:~# for p in $(ceph osd pool ls); do echo -n ${p}-; ceph osd pool get $p crush_rule; done
rbd-crush_rule: replicated_ruleset
images-crush_rule: replicated_ruleset
volumes-crush_rule: replicated_ruleset
.rgw.root-crush_rule: replicated_ruleset
default.rgw.control-crush_rule: replicated_ruleset
default.rgw.meta-crush_rule: replicated_ruleset
default.rgw.log-crush_rule: replicated_ruleset
default.rgw.buckets.index-crush_rule: replicated_ruleset
default.rgw.buckets.data-crush_rule: hdd-only
default.rgw.buckets.non-ec-crush_rule: replicated_ruleset

To assign a new CRUSH-map to a pool, use the following command:

root@cephmon1:~# ceph osd pool set <POOL> crush_rule <CRUSH-MAP>

Map osd to physical disk

On a cephmon

  • ceph osd tree
    • For the output needed
    • ceph osd tree | grep down
  • ceph osd find osd.XXX
    • For the output needed
    • ceph osd find osd.XXX | grep host

On the storage node

Find the device

  • ceph-volume lvm list
    • For the output needed
    • ceph-volume lvm list | grep 'osd id\|devices'

Find serial number

  • smartctl -a <device from above>  | grep -i "Serial Number"

Find physical drive bay

Idrac

Use idrac and track the serial number of the disk to which drive bay

Use the OS to trigger disk light if disk is working

  • dd if=<device from above> of=/dev/null
  • No labels