Migrating to the new VM-based architecture

This page takes you trough the journey from the old controller-based architecture to the new VM-based architecture.

The original architecure

Originally the "ntnusky" architecture were based around machines in three distinct roles:

Compute-nodes - The physical machines responsible for running the virtual machines for openstack.
Storage-nodes - The physical machines providing storage for the ceph-cluster which in turn provides storage to the virtual machines
Controller-nodes - The physical machines doing all the managemnet of the openstack cluster:
1. Ceph monitors
2. Openstack services
  1. API's
  2. Dashboard
  3. Compute schedulers/proxies/etc
  4. Neutron routers/dhcp/other networking
3. MySQL databases
4. Caching (memcached)

The architecture have been realized with 3 Controllers and as many storage/compute nodes as needed to get enough resources, and have served the skyhigh installation (at Gjøvik) well for several years. The architecture has also been relying on two other servers for their operation:

Manager: Acts as puppet-master, machine-dashboard, DHCP/DNS
Monitor: Munin and sensu-master.

Difficulties

There have been a couple of down-sides with this architecture:

Upgradability: The controllers have had tons of services running, and when this node is going to be upgraded, the upgrades affects every service. Upgrades includes new openstack versions, new ceph versions, new OS versions etc. It would be nice to be able to update one service at the time, without affecting every other service.
Redundancy: It would be nice to have an architecture with as few SPOF's (Single Point Of Failure) as possible. This architecture have some SPOF's in the fact that the manager machine is the only DHCP and DNS server, and the only puppet-master etc.
Upgradability/Redundancy: Failover between the controllers have been a little troublesome. The redundancy is based on keepalived's implementation of the VRRP protocol, and this implementation has some errors making failover slower. In addition, when a service between keepalived should be taken down for maintainance, all existing connections will be broken. There is no way to finish existing connections to a server wile leading all new connection to another server.
Separation: Separation of traffic to/from our users VM's and the traffic to/from our own services would ideally not pass trough the same interfaces. This is to reduce the chances for traffic congestion on our service endpoints.

All in all it was decided to create a new infrastructure to cope with these problems.

The new VM based infrastructure

The goals with the new infrastructure is:

Reduce the number of services hosted by each OS to simplify upgrades
Improve the redundancy by having load-balancers in front of all services. This way traffic can be controlled to the servers as needed.
Remove as many SPOF's as possible
Separate user-traffic from services as best as possible.

These goals are fullified by having the following machine roles for our physical machines:

Compute - As before
Storage - As before
Network - Dedicated nodes for handling the user-traffic. All virtual routers and related network functions are running here.
Infra/Api - KVM hosts which hosts VM's for all the tasks the old manager/monitor/controllers were doing, except for networking. They hosts VM's we use for:
- Openstack - All the services we actually want to run
  - cinder: Runs the cinder api, and the scheduler and volume services.
  - glance: Runs the glance-api and the glance-registry services
  - heatapi: Runs the heat API
  - heatengine: Runs the workers performing tasks needed by openstack heat
  - horizon: Hosts the openstack web-based dashboard (horizon)
  - keystone: Hosts the openstack keystone service, which is used for authentication
  - neutronapi: Hosts the neutron api's
  - novaapi: Hosts the nova api
  - novaservices: Hosts the nova scheduler, vncproxies etc.
- Machine management - All the services we need to install and manage our machines (both physical an virtual).
  - dhcp: DHCP and TFTP servers; handing out addresses and boot-images to our physical and virtual machines.
  - shiftleader: Our dashboard used for administering the machines.
- Configuration management - All the services used for our configuration management.
  - postgres: Hosts the postgres database used by the puppetdb
  - puppet: Our puppetmasters
  - puppetca: The puppet certificate authority (We only have one of these machines)
  - puppetdb: The puppet database service. Stores node information and statistics
- General infrastructure - General infrastructure services hused by multiple of our machines.
  - adminlb and servicelb: HAProxy load-balancers load-balancing traffic to our services from our own infrastructure and from our users respectively.
  - cache: Memcached servers which are used by several of our openstack services.
  - cephmon: Ceph monitors. The hosts resposnible for controlling our ceph cluster
  - rabbit: Hosts the message-queues which openstack and sensu uses to communicate internally.
  - mysql: Provides the galera-cluster which acts as openstacks database backend.
- Monitoring - Services which helps us understand whats going on.
  - munin: Our munin-masters. Responsible for graphing monitor-data from all our hosts.
  - redis: The redis servers, which acts as a key/value store for our sensu monitoring system
  - sensu: Our monitoring system which is responsible for notifying us when something breaks.

One must consider which services a certain installation should have, depending on how much the infrastructure should integrate with other infrastructure. At the Gjøvik installations (skyhigh and skylow) we only use DNS from NTNU central IT, while the Trondheim-installations might use their own tools for machine management. In that case, one would not install the dhcp and shiftleader role, but rather make sure that their services is provided externally.

Installation of the new infrastructure

The installation of the new infrastructure is performed trough multiple steps which is explained in this section.

Establishing the base environment

It is really recommended to not manually install any of the servers which should end up in the final infrastructure; but as the chicken-and-the-egg problem of automaticly installing the first host is very real one is likely to need a manual installation of a host to begin with. We have two recommended approaches:

Install a node with the bootstrap role. This is described in this article.
Install and configure machines manually to provide at least the following services with their dependencies:
1. DHCP and a PXE-boot environment
2. Puppet-master and puppetdb
3. puppetca (can be a part of the puppet-master)

All the services provided with these manually based machines can be installed with our puppet-infrastructure later on, if they are not already placed under some other satisfactory configuration management and monitoring (ie: NTNU-IT's cfengine and monitoring).

Install infra/api-nodes

The first nodes to be installed should be the KVM-hosts for our infrastructure, as all our services will run as VM's on these. The configuration of these nodes are:

Physical machines with at least two NIC's:
- One for installation and management of the machine itself. Should be an access-port (untagged) in the infrastructure VLAN.
- One NIC which is a trunk (have multiple tagged VLANS) which carries the infrastructure VLAN, the storage VLAN and the services VLAN.
The nodes should be installed with the role "role::kvm" which sets up KVM and networking. Afterwards VM's on this node can be administered trough "Virtual Machine Manager" or any other KVM administration tool over ssh. Remember to add relevant network config for these hosts in hiera. Particulary important are these keys.

Please see this article for a more lengthy explanation.

Install OS om VM's, and let them get base-configuration.

A good starting-point now would be to create all the VM's which should be your infrastructure and then tank them with an OS and the "role::base" class. This way you can see that all installations goes fine, and that they recieve appropriate networking configuration for all their interfaces.

There are some hierakeys that you should populate to ease the trasition a little:

ntnuopenstack::oldcontrollers
- A hash with 'controller-name':'controller-ip' pairs for each of the old controllers.
profile::networks::management::ipv4::prefix::extra
- The IPv4 prefix for the old management network.
profile::interfaces::IFNAME::routes:
- In the node-specific hiera-file for all hosts having a leg in the storage-network you should add an extra route to the old storage-network if this network is separate from the new. (Probably only applies for skylow and skyhigh)

Start installing services in the VM's

The first services to install should be:

General infrastructure
- When installing new ceph-mons it is important that the old monitors knows about the new one first. Pay attention to the installation article.
Machine management (if these services is not provided externally)
Configuration management (if these services is not provided externally)
Monitoring

At this point you should have the base infrastructure in place. If you register the old controllers in hiera (the key ntnuopenstack::oldcontrollers, which should be a hash where the key is the controller-name and the value is the controllers management IP) they will be configured as haproxy-backends in the new infrastructure.

If there is a migration from old to new networks at the same time, the old machines (controllers, storage and compute) needs to get some routes to the new networks. This way they will be able to use the new services

Migrate traffic trough the new infrastructure

When the loadbalancers is installed, and have at least one backend for every service (the old controllers can for now be good backends for the openstack services) you can start to route traffic trough the loadbalancers.

By setting the keys "profile::openstack::endpoint::*" in hiera to 'https://*whatevernameyouplantouse*' in the old controllers the endpoint-catalog will be updated to point to these names.
- If you do not want TLS on the API's you can use http in the link; but in this case the keys profile::haproxy::management::apicert and profile::haproxy::services::apicert have to be unset. The haproxy responds with an empty response if it is configures for TLS but gets a cleartext request.
TODO: What about rabbit?
TODO: something else?

Install openstack services

When traffic passes trough the loadbalancers it is straightforward to install openstack services on the VM's. When installing a new API node puppet will make sure to configure the loadbalancers to include it in their configuration. You should thus now install the following roles on vm's:

openstack::cinder
openstack::glance
openstack::heat::api
openstack::heat::engine
openstack::horizon
openstack::keystone
openstack::neutron::api
openstack::nova::api
openstack::nova::services

You should now also install one or more physical nodes with the role "openstack::neutron::net" which will handle our user's traffic.

Migrate away from the old networks

To migrate away from the old networks you would need to reinstall the compute and storage machines when they move to the new networks.

Reinstall compute

Migrate all VM's to another compute-host
Move the machine to new VLAN's
Reinstall the compute-node.

Reinstall storage

Disable ceph rebalancing.
Move the machine to new VLAN's, and reinstall it without having OSD's listed in the hierafile.
Add the old OSD's like described in this article.
Enable ceph rebalancing.

Decommision old controllers

Decommisioning of the old controllers can be done when all the services are delivered from other machines. Follow these steps:

Stop the ceph manager and monitors on a controller.
Remove this controller from the ceph cluster.
Migrate routers and DHCP agents to new neutron network nodes.
Gracefully shut down the machine to allow it to leave the galera cluster nicely.
Deregister all agents and services this controller had in openstack.

Page tree