Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There are multiple moving parts in ths the SkyHiGh arcitecture, and it might thus be a complex task to upgrade it. The upgrade process is currently not as streamlined as it could be, and although it is possible to do a live upgrade of everything we currently brings down some services when the upgrade is running to be on the safe side.

Update the r10k repo to pull inn all the correct modules

As all our settings are controlled by puppet, and all our puppet modules are downloaded by r10k, the r10k repo is exellent to control the upgrade.

If all of the infrastructure is going to be upgraded to a tested configuration, it is a matter of merging the new configuration to the current branch of the r10k repo. For instance, to upgrade the whole skyhigh platform to the Liberty release a merge of the master-branch at tag "v0.3.0" is performed into the skyhigh branch by the following commands:

Code Block
titleMerge "v0.3.0" into skyhigh
firstline1
$ git checkout skyhigh
$ git merge "v0.3.0"

After the merge it can be smart to edit the Puppetfile to also download the same versions of the role and profile repos:

Code Block
titleTrack the correct repos
firstline1
  mod 'role',
   :git => 'https://github.com/ntnusky/role.git',
-  :tag => 'master'
+  :tag => 'v0.3.0'
 
 mod 'profile',
   :git => 'https://github.com/ntnusky/profile.git',
-  :tag => 'master'
+  :tag => 'v0.3.0'

When one is ready to start the upgrade, the modules of the environment skyhigh can be deployed on the manager like so:

Code Block
titleDeploy puppet modules using r10k
firstline1
root@manager:~# r10k deploy environment skyhigh -vp

At this point puppet will start to pull in the changes, so if you want to control the upgrade node-by-node it is probably smart to stop the puppet agent on the nodes before this step.

Upgrade the Storage cluster

The storage cluster in SkyHiGh runs Ceph. Ceph can typically be upgaded while running, but ALWAYS consult the release notes of the new version to make sure that the upgrade is possible. Below, the general upgrade procedure are described:

General procedure - Ceph upgrades

What we typically do to upgrade ceph are these steps, while monitoring the output of "ceph -s" to ensure that nothing unexpected happens:

  1. Pull in a newer version of puppet-ceph trough r10k, and make sure to update our puppet configuration if necessary. The puppet module configures apt to track the new version's repo for us, and the upgrades are thus able in apt-get.
  2. Upgrade the monitors FIRST! You can upgrade one monitor at the time to ensure quorum at all times between the remaining monitors. The upgrade is started by an "apt-get dist-upgrade" After the upgrade, the monitor needs to be restarted by "service ceph-mon restart".
    • There might be caveats, so READ the release notes first. For instance, from hammer->jewel the deamon changes from the "root" user to the "ceph" user, and one has to change these permissions manually before the monitor starts again. This is why it is smart to restart one at a time!
  3. Next up are the storage nodes.  If the storage-node upgrade needs to convert data (read the release notes), and thus is expected to take some time it can be smart to stop the automatic cluster rebalancing first: "root@controller01:~# ceph osd set noout". After the upgrade, the ceph services needs to be restarted on the storage nodes as well, using "service ceph-all restart".
  4. Now when the storage nodes are upgraded it is time to re-enable automatic rebalancing again if we disabled it earlier: "root@controller01:~# ceph osd unset noout".
  5. The final step is to update metadata-servers and object gateways. Currently we do not have these in SkyHiGh, but the object-gateway might be set up in the future when er implements Openstack Swift.

Upgrade the Openstack Installation

The openstack installation is currently not in a state that it allows live upgrades. The virtual machines can be running, but all the openstack services on the controllers needs to be disabled before one of the controllers are updated.

Upgrade to a previously tested configuration.

This is what we do in production environments. The general upgrade procedure are:

  1. Stop puppet on all controller nodes, and merge in the correct configuration using r10k.
  2. Stop ALL openstack services on all controllers.
  3. Do a puppet-run on the "primary" controller (The one doing the db-sync etc; in SkyHiGh its controller01), which will fail.
  4. Perform an installation of the newer packages using 'apt-get -y dist-upgrade -o Dpkg::Options::="--force-confnew"'
  5. Test all api's, and fix db sync errors where those appear. (Typically visible by having errors in the api's, and log messages complaining about missing fields)
  6. Perform step 2+3 on the other controllers
  7. Test
  8. Start puppet on compute-node, let it run once before the 'apt-get -y dist-upgrade -o Dpkg::Options::="--force-confnew"' is issued to install newer packages.
  9. Test and verify a successful upgrade.

Upgrade to a new verison, and test it in skylow.

Unfinished notes:

...

articles in this part of the wiki is intended to guide you through the upgrade-procedures for parts of the infrastructure:

Page Tree
root@self

...