This article summarizes the steps required to upgrade from the xena release to the yoga release of openstack.
Prerequisites:
- This documents expects that your cloud is deployed with a recent xena tag of the ntnuopenstack repository.
- You have a recent mysql backup in case things go south.
- If you want to do a rolling upgrade, the following key should be set in hiera long enough in advance that all hosts have had an puppet-run to apply it:
nova::upgrade_level_compute: '6.0'
- When the upgrade is finished - the key should still be set to '6.0'
- (Yoga is 6.0; zed is 6.1, so the next release needs a change here...)
- These version-numbers can be correlated to release-name in the file /usr/lib/python3/dist-packages/nova/compute/rpcapi.py
The recommended order to upgrade the services are listed below:
Keystone
This is the zero downtime approach
Before you begin
- Login to a mysql node, start the mysql CLI, and run
set global log_bin_trust_function_creators=1;
Upgrade-steps (start with a single node):
- Set
apache::service_ensure: 'stopped'
in hiera for the node that you are upgrading - Run puppet with the yoga modules/tags, run apt-get dist-upgrade, and run puppet again
- The first puppet-run complains a lot; as it changes its logic for openstack auth all openstack-changes fails; run puppet once more if this bugs you
- Run
keystone-manage doctor
and ensure nothing is wrong - Run
keystone-manage db_sync --expand
- Returns nothing
- Run
keystone-manage db_sync --migrate
- Returns nothing
- At this point, you may restart apache2 on this node
- Remove the
apache::service_ensure: 'stopped'
previously set in hiera.
- Remove the
- Upgrade keystone on the other nodes, one at a time
- Basically run step 1, 2 and 6 on the other nodes
- When all nodes are upgraded, perform the final DB sync
keystone-manage db_sync --contract
Glance
To upgrade glance without any downtime you would need to follow the following procedure:
- Select which glance-server to upgrade first.
- In the node-specific hiera for this host you should set:
glance::api::enabled: false
- In the node-specific hiera for this host you should set:
- Run puppet with the yoga modules/tags, run apt-get dist-upgrade, and run puppet again
- Run
glance-manage db expand
- Run
glance-manage db migrate
- Remove the
glance::api::enable: false
from the node-specific hiera, and run puppet again. This would re-start the glance api-server on this host.- Test that this api-server works.
- Upgrade the rest of the glance hosts (ie; step 2 for each of the remaining glance hosts)
- Run
glance-manage db contract
on one of the glance-nodes.
Enable glance quotas through keystone unified limits
If you want to add quotas to limit tenants possibility to use too much storage for their images you need to register default-quotas in keystone. Substitute "SkyLow" with the relevant region-name:
# Default-quota of 10 images and 50GB openstack registered limit create --service glance --region SkyLow --default-limit 50000 image_size_total openstack registered limit create --service glance --region SkyLow --default-limit 10 image_count_total # Default-quota of 5 images and 50GB which is currently being uploaded. openstack registered limit create --service glance --region SkyLow --default-limit 50000 image_stage_total openstack registered limit create --service glance --region SkyLow --default-limit 5 image_count_uploading
Enable the unified limit integration for glance by adding the following lines in hiera:
ntnuopenstack::glance::endpoint::internal::id: '<GLANCE INTERNAL ENDPOINT ID>' ntnuopenstack::glance::keystone::limits: true
Cinder
To upgrade cinder without any downtime, follow this procedure
- Add the following three lines to the node-file of the first node you would like to upgrade:
apache::service_ensure: 'stopped'
cinder::scheduler::enabled: false
cinder::volume::enabled: false
- Run puppet with the yoga modules/tags, run apt-get dist-upgrade, and run puppet again
- Run
cinder-manage db sync && cinder-manage db online_data_migrations
- Remove the lines added at step 1, re-run puppet, and test that the upgraded cinder version works.
- Perfom step 2 for the rest of the cinder nodes
Neutron
API-nodes
- Pick the first node, and run puppet with the yoga modules/tags, Run
apt-get autoremove && apt-get dist-upgrade
- Run
neutron-db-manage upgrade --expand
- Restart neutron-server.service and rerun puppet
- Upgrade the rest of the API-nodes (repeating step 1, and 3)
- Stop all neutron-server processes for a moment, and run:
neutron-db-manage upgrade --contract
- Re-start the neutron-server processes
BGP-agents
Either you simply reinstall the node with yoga modules/tags; or you follow the following list:
- Run puppet with the yoga modules/tags
- Run
apt dist-upgrade
- Rerun puppet and restart the service
systemctl restart neutron-bgp-dragent.service
or simply reboot
Network-nodes
Either you simply reinstall the node with yoga modules/tags; or you follow the following list:
- Run puppet with the yoga modules/tags
- Run
apt dist-upgrade
- Rerun puppet and restart the service (or simply reboot the host).
systemctl restart ovsdb-server
systemctl restart neutron-dhcp-agent.service neutron-l3-agent.service neutron-metadata-agent.service neutron-openvswitch-agent.service neutron-ovs-cleanup.service
- Verify that routers on the node actually work.
Placement
- Install the first node; either by resintaling it with the yoga modules/tags, or follow this list:
- Run puppet with yoga modules/tags
Run systemctl stop puppet apache2
- Run
apt-get purge placement-api placement-common python3-placement && apt-get autoremove && apt-get dist-upgrade
- Run puppet again
- Run
placement-manage db sync; placement-manage db
online_data_migrations
on the new node. - upgrade the rest of the nodes (Step 1)
Nova
To upgrade nova without any downtime, follow this procedure
Preperations
Before the upgrades can be started it is important that all data from previous nova-releases are migrated to xena release. This is done like so:
- Run
nova-manage db online_data_migrations
on an API node. Ensure that it reports that nothing more needs to be done.
Nova API
- In the node-specific hiera, disable the services at the first node you would like to upgrade with the keys
apache::service_ensure: 'stopped'
- Do one of:
- Run puppet with the yoga modules/tags, Run
apt dist-upgrade && apt-get autoremove
- Reinstall the node with yoga modules/tags
- Run puppet with the yoga modules/tags, Run
- Run
nova-manage api_db sync
- Run
nova-manage db sync
- Re-enable nova API on the upgraded node:
- Remove
apache::service_ensure: 'stopped'
from the upgraded node's hiera file
- Remove
- Upgrade the rest of the nodes (basically run step 2)
Nova-services
- Run puppet with the yoga modules/tags
- Run
apt dist-upgrade && apt-get autoremove
- Run puppet and restart services
Enable nova quotas through keystone unified limits
Currently only for testing:
The nova-project are currently testing the unified quota system, but are currently not recommending it for production use!
If you want to test the new unified quota system you first need to register some relevant limits. Substitute "SkyLow" with the relevant region-name:
# Default-quota of 20 instances, 20 VCPUs, 40GB RAM and none VGPUs. openstack registered limit create --service nova --region SkyLow --default-limit 20 class:VCPU openstack registered limit create --service nova --region SkyLow --default-limit 0 class:VGPU openstack registered limit create --service nova --region SkyLow --default-limit 40960 class:MEMORY_MB openstack registered limit create --service nova --region SkyLow --default-limit 20 servers
Enable the unified limit integration for glance by adding the following lines in hiera:
ntnuopenstack::nova::endpoint::internal::id: '<NOVA INTERNAL ENDPOINT ID>' ntnuopenstack::nova::keystone::limits: true
Heat
The rolling upgrade procedure for heat includes a step where you are supposed to create a new rabbit vhost. I don't want that. Therefore, this is the cold upgrade steps.
- Set
heat::api::enabled: false
andheat::engine::enabled: false
andheat::api_cfn::enabled: false
in hiera to stop all services - Do one of:
- Run puppet with yoga modules/tags, Run
apt-get update && apt-get dist-upgrade && apt-get autoremove
- Reinstall the nodes with yoga modules/tags
- Run puppet with yoga modules/tags, Run
- Run
heat-manage db_sync
on one of the api-nodes. - Remove the hiera keys that disabled the services and re-run puppet
Barbican
Barbican must be stopped for upgrades, and can thus be performed on all barbican hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
- Stop all barbican-services by adding the following keys to node-specific hiera, and then make sure to run puppet on the barbican hosts:
barbican::worker::enabled: false
apache::service_ensure: 'stopped'
Run puppet with the yoga modules/tags
Run
apt dist-upgrade && apt-get autoremove
Run
barbican-db-manage upgrade
Re-start barbican services by removing the keys added in step 1 and re-run puppet.
Magnum
Magnum must be stopped for upgrades, and can thus be performed on all magnum-hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
We can go back to Ubuntu for magnum-servers now. So, before you begin - reinstall VMs to Ubuntu 20.04.
In Ubuntu, this is needed in the node-specifig hiera:
apache::mod::wsgi::package_name: 'libapache2-mod-wsgi-py3' apache::mod::wsgi::mod_path: '/usr/lib/apache2/modules/mod_wsgi.so'
- Stop all magnum-services by adding the following keys to node-specific hiera, and then make sure to run puppet on the magnum hosts:
magnum::conductor::enabled: false
apache::service_ensure: 'stopped'
Run puppet with the yoga modules/tags
Run
apt dist-upgrade && apt autoremove
Run
su -s /bin/sh -c "magnum-db-manage upgrade" magnum
Re-start magnum services by removing the keys added in step 1 and re-run puppet.
- Check if a new Fedora CoreOS image is required, and if new public cluster templates should be deployed. I.e to support a newer k8s version
- Hint: You need Fedora CoreOS 35 now =) And you need this specifc build!!!
Octavia
Octavia must be stopped for upgrades, and can thus be performed on all octavia-hosts at the same time. It might be an idea to keep one set of hosts stopped at old code in case of the need for a sudden roll-back.
- Stop all octavia-services by adding the following keys to hiera, and then make sure to run puppet on the octavia hosts:
octavia::housekeeping::enabled: false
octavia::health_manager::enabled: false
octavia::api::enabled: false
octavia::worker::enabled: false
Do one of:
- Reinstall the node with yoga modules/tags
Run puppet with the yoga modules/tags, Run
apt-get dist-upgrade && apt-get autoremove,
Run puppet
Run
octavia-db-manage upgrade head
Re-start octavia services by removing the keys added in step 1 and re-run puppet.
- Build a yoga-based octavia-image and upload to glance. Tag it and make octavia start to replace the amphora.
Horizon
- Run puppet with the yoga modules/tags
- run
dnf upgrade --allowerasing
- Yes this is weird: Login to all memcached servers, and run
systemctl restart memcached
- Run puppet again
- restart httpd
Compute-nodes
When all APIs etc. are upgraded, it is time to do the same on the compute-nodes. Compute nodes are simple to upgrade:
- Do one of:
- Reinstall the node with yoga modules/tags
- Run puppet with the yoga modules/tags, Run
apt dist-upgrade && apt-get autoremove
- Reboot the compute-node
- When it comes up, see that the storage-interface is up. It it isnt, run a manual puppet-run to fix it.
GPU-nodes
- The mdev-mappings need yet another change in hiera. This time you should:
Change the nova::compute::mdev::mdev_types_device_addresses_mapping parameter to something like this::
nova::compute::mdev::mdev_types: nvidia-45: device_addresses: [ '0000:3d:00.0', '0000:3e:00.0', '0000:3f:00.0', '0000:40:00.0' ]
- Remove the old keys:
nova::compute::mdev::mdev_types_device_addresses_mapping
nova::compute::vgpu::vgpu_types_device_addresses_mapping
- Run puppet with the yoga modules/tags
- Run
apt dist-upgrade && apt autoremove
- Run puppet again
- Restart openstack services and openvswitch-services
Finalizing
- Run
nova-manage db online_data_migrations
on a nova API node. Ensure that it reports that nothing more needs to be done. - Rotate octavia images.
- Remove old authtoken-related keys from hiera:
- barbican::keystone::authtoken::*
- cinder::keystone::authtoken::*
- heat::keystone::authtoken::*
- magnum::keystone::authtoken::*
- magnum::keystone::keystone_auth::*
- octavia::keystone::authtoken::*
- Remove old database-connection keys from hiera:
- barbican::db::database_connection
- magnum::db::database_connection
- nova::db::api_database_connection
- nova::db::database_connection
- octavia::db::database_connection
- Remove other keys which now have sane defaults that we do not need to override:
- barbican::api::max_allowed_secret_in_bytes
- barbican::api::max_allowed_request_size_in_bytes