We use MySQL (or: mariadb) to store various kinds of data. We want this service to be highly available (HA) as our services will not work when the database is down.

Architecture

We are utilizing multiple (3) database servers which form a quorum based cluster where the majority vote is important in the case that a server disappears. This is to ensure that we do not get a split-brain scenario where we have two separate instances of our database. The cluster is realized using Galera.

In front of the cluster, to help failover and loadbalance, we have placed a haproxy instance balancing the TCP requests between the database servers.

Installing new nodes

To install a new database node, simply install a machine with the role "role::mysql". When installed this node will contact the existing database servers to get a replicate of the active database before it joins the cluster. The role will also register the node in the haproxy loadbalancers, so that traffic will be destined to the new node after the next puppet-run on the loadbalancers.

After the node is successfully installed you should update hiera (the key "profile::mysqlcluster::servers") to include the new node, so that if the other database servers are turned off they can bootstrap from the new node.

Starting the cluster

When a mysql/galera node starts it does not start to operate as a database-server unless it is very sure that he is part of the only mysql cluster serving this database. In normal circumstances this means that the starting node contacts the other nodes in the cluster to find the active cluster. When the cluster is found, it joins it before it starts to serve clients.

This schema presents a problem if all database servers are down. In this case one server (preferrebly the last one to go down) should be forced to start first, and then the rest will load their databases from this node when they start. To force a node to start you should edit "/etc/mysql/my.cnf" so that the line listing other nodes in the cluster lists no nodes: "wsrep_cluster_address = gcomm://192.168.254.2,192.168.254.8,192.168.254.9,192.168.254.10 -> wsrep_cluster_address = gcomm://". After this edit a normal "systemctl start mysql" will work on this node. Next you should start the rest of the servers before you run puppet on the first node to revert your configchange.

Backups

The mysql servers are regularly dumping the content of their databases to "/var/backups/MysqlDump*". Old backups are automaticly deleted to prevent full disks.

 

  • No labels