Overview
The servers run linux and gets security patches continously, but if reboot is needed, we reboot during the normal patching of linux servers.
Reboot order
The openstack are rebooted as follows
Patch day before 16:00
- Storage nodes. Rebooted one by one and does not cause any interference with the openstack availability. Two nodes can be down and we still have the Ceph storage available, and since Ceph is verified up and running 100% before next node is rebooted it's safe.
Patch day after 16:00
- Compute nodes. This will cause the instances running on that node to be shut down before the compute node reboots and they will be started when the node is down. The instance will be unavailable ~10 to 15 minutes when this is happening.
- Infrastructure nodes. There are three infrastructure nodes with all the services running on each behind a load balancer. There might be small delay in network access to the instance when the loadbalancer changes it's target or the active loadbalancer is taken down.
Patching should be finnished before 23:00, but experience shows that it's finished at around 20:00.
Patching procedures
Storage nodes
Log in to a ceph monitor (cephmon0, 1 or 2) and run the command "watch -n 1 ceph -s". Verify the following :
# health: should be ok health: HEALTH_OK # mon: should be 3 daemons and have quorum # osd: all should be up, as of this example 50 of 50 are up. services: mon: 3 daemons, quorum cephmon0,cephmon1,cephmon2 mgr: cephmon0(active), standbys: cephmon1, cephmon2 osd: 50 osds: 50 up, 50 in rgw: 1 daemon active data: pools: 10 pools, 880 pgs objects: 1.39M objects, 5.59TiB usage: 16.8TiB used, 74.2TiB / 91.0TiB avail pgs: 878 active+clean 2 active+clean+scrubbing+deep io: client: 8.16KiB/s rd, 2.01MiB/s wr, 105op/s rd, 189op/s wr
When everything is ok, reboot first node and await for ceph to be ok again before doing the next.
Compute nodes
Verify the instances running on the compute node
openstack server list --all --host compute01 +--------------------------------------+--------------------+--------+-----------------------------------------+---------------------------------------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------------+--------+-----------------------------------------+---------------------------------------------+-----------+ | 5c32f1d1-2f12-1234-beffe112345ceffe1 | kubertest-master-2 | ACTIVE | kubertest=10.2.0.7, 129.241.152.9 | CoreOS 20190501 | m1.xlarge | +--------------------------------------+--------------------+--------+-----------------------------------------+---------------------------------------------+-----------+
- Check if one or more of the instances have ok network.
- Check if there are no more than 1 kube master on a compute node. They require quorum, so moving a master is needed if there are two instances of the same master on one compute node