...
- This documents expects that your cloud is deployed with a recent zed tag of the ntnuopenstack repository.
- You have a recent mysql backup in case things go south.
- If you want to do a rolling upgrade, the following key should be set in hiera long enough in advance that all hosts have had an puppet-run to apply it:
nova::upgrade_level_compute: '6.1'
- When the upgrade is finished - the key should still be set to 'XX6.XX2'
- These version-numbers can be correlated to release-name in the file /usr/lib/python3/dist-packages/nova/compute/rpcapi.py
...
- Run puppet with the 2023.1 modules/tags
- Run
apt dist-upgrade
- Rerun puppet and restart the service (or simply reboot the host).
systemctl restart ovsdb-server
systemctl restart neutron-dhcp-agent.service neutron-l3-agent.service neutron-metadata-agent.service neutron-openvswitch-agent.service neutron-ovs-cleanup.service
- Verify that routers on the node actually work.
Placement
- Install one node at a time, either by reinstalling it using the 2023.1 modules/tags or by following this list::
- Run puppet with 2023.1 modules/tags
Run systemctl stop puppet apache2
- Run
apt-get purge placement-api placement-common python3-placement && apt-get autoremove && apt-get dist-upgrade
- Run puppet again
Nova
To upgrade nova without any downtime, follow this procedure
Preperations
Before the upgrades can be started it is important that all data from previous nova-releases are migrated to zed release. This is done like so:
- Run
nova-manage db online_data_migrations
on an API node. Ensure that it reports that nothing more needs to be done.
Nova API
- In the node-specific hiera, disable the services at the first node you would like to upgrade with the keys
apache::service_ensure: 'stopped'
- Do one of:
- Run puppet with the 2023.1 modules/tags, Run
apt dist-upgrade && apt-get autoremove
- Reinstall the node with 2023.1 modules/tags
- Run puppet with the 2023.1 modules/tags, Run
- Run
nova-manage api_db sync
- Run
nova-manage db sync
- Re-enable nova API on the upgraded node:
- Remove
apache::service_ensure: 'stopped'
from the upgraded node's hiera file
- Remove
- Upgrade the rest of the nodes (basically run step 2)
Nova-services
Either reinstall the node using the 2023.1 modules/tags, or follow this list:
- Run puppet with the 2023.1 modules/tags
- Run
apt dist-upgrade && apt-get autoremove
- Run puppet and restart services
Heat
The rolling upgrade procedure for heat includes a step where you are supposed to create a new rabbit vhost. I don't want that. Therefore, this is the cold upgrade steps.
- Set
apache::service_ensure: false, heat::api::enabled: false
,heat::engine::enabled: false
andheat::api_cfn::enabled: false
in hiera to stop all services - Do one of:
- Run puppet with 2023.1 modules/tags, Run
apt-get update && apt-get dist-upgrade && apt-get autoremove
- Reinstall the nodes with 2023.1 modules/tags
- Run puppet with 2023.1 modules/tags, Run
- Run
heat-manage db_sync
on one of the api-nodes. - Remove the hiera keys that disabled the services and re-run puppet
Barbican
Barbican must be stopped for upgrades, and can thus be performed on all barbican hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
- Stop all barbican-services by adding the following keys to node-specific hiera, and then make sure to run puppet on the barbican hosts:
barbican::worker::enabled: false
apache::service_ensure: 'stopped'
Run puppet with the 2023.1 modules/tags
Run
apt dist-upgrade && apt-get autoremove
Run
barbican-db-manage upgrade
Re-start barbican services by removing the keys added in step 1 and re-run puppet.
Magnum
Magnum must be stopped for upgrades, and can thus be performed on all magnum-hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
- Stop all magnum-services by adding the following keys to node-specific hiera, and then make sure to run puppet on the magnum hosts:
magnum::conductor::enabled: false
apache::service_ensure: 'stopped'
Run puppet with the 2023.1 modules/tags
Run
apt dist-upgrade && apt autoremove
Run
su -s /bin/sh -c "magnum-db-manage upgrade" magnum
Re-start magnum services by removing the keys added in step 1 and re-run puppet.
- Check if a new Fedora CoreOS image is required, and if new public cluster templates should be deployed. I.e to support a newer k8s version
- The official documentation provides a nice bit of help with this.
Octavia
Octavia must be stopped for upgrades, and can thus be performed on all octavia-hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
- Stop all octavia-services by adding the following keys to hiera, and then make sure to run puppet on the octavia hosts:
octavia::housekeeping::enabled: false
octavia::health_manager::enabled: false
octavia::api::enabled: false
octavia::worker::enabled: false
Do one of:
- Reinstall the node with 2023.1 modules/tags
Run puppet with the 2023.1 modules/tags, Run
apt-get dist-upgrade && apt-get autoremove,
Run puppet
Run
octavia-db-manage upgrade head
Re-start octavia services by removing the keys added in step 1 and re-run puppet.
- Build a 2023.1-based octavia-image and upload to glance. Tag it and make octavia start to replace the amphora.
Horizon
- Run puppet with the 2023.1 modules/tags
- Add the following to the node-specific hiera file for horizon nodes:
- apache::mod::wsgi::package_name: 'libapache2-mod-wsgi-py3'
- apache::mod::wsgi::mod_path: '/usr/lib/apache2/modules/mod_wsgi.so'
- run
apt dist-upgrade && apt autoremove
- Run puppet again
- restart apache2
Compute-nodes and GPU-nodes
When all APIs etc. are upgraded, it is time to do the same on the compute-nodes.
Preliminary tasks
From 2023.1 and onwards the compute-nodes need to have its hypervisor UUID on disk, and we must thus list them in hiera. Use the following one-liner to populate the initial list in hiera:
Code Block |
---|
$openstack hypervisor list -f value -c ID -c 'Hypervisor Hostname' --sort-column 'Hypervisor Hostname' | awk '{ print " " $2 ": " $1}' |
Paste the output from the above command into a suitable hiera-file (for instance create one called computeIDs.yaml) under the key 'ntnuopenstack::nova::compute::ids'.
Installing antelope (2023.1) on the compute-nodes:
Compute nodes are simple to upgrade:
- Reinstall the node with 2023.1 modules/tags
- Run "
apt update; apt dist-upgrade -y
" to get the correct openvswith packages.