Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

qOur openstack-clouds have multiple compute-nodes, and when a VM is created the scheduler is responsible to select a node to run the VM. Initially we did not really care where VM's were placed, but as the clouds grow we got special considerations to make when it comes to scheduling that we need to account for. The special considerations we take are:

Table of Contents
maxLevel1

Keep room for large instances

An issue we have when distributing VM's to all our nodes are that the hypervisors all get equally full, and we end up in a situation where large VMs cannot be sheduled as no hypervisors have room for the VM, even though the hypervisors combined have enough resources available. One way to fix this issue is to change the scheduling logic from "Select the compute-node with the most RAM/CPU available" to "Select the compute-node with the least amount of RAM available, while still having room for the VM being scheduled". We tune our schedulers to this logic by setting the following keys in hiera:

Code Block
nova::scheduler::filter::cpu_weight_multiplier: '0.0'
nova::scheduler::filter::disk_weight_multiplier: '0.0'
nova::scheduler::filter::ram_weight_multiplier: '-2.0'
nova::scheduler::filter::io_ops_weight_multiplier: '-0.3'

These keys make sure that we rather strongly insists in placing new VM's on hosts with high RAM utilization, unliss that node have multiple IO-operations already on-going (ie: there are multiple VMs migrating/booting). We ignore CPU-count (as we in general sees more RAM-heavy than CPU-heavy VM's) and disk-space (As all VM's live on the same ceph-cluster anyway). 

Schedule windows to licensed hosts

...

We

...

have the following articles describing various parts of how we manage our compute-nodes:

Page Tree
root@self

  • os_type='windows'

When we upload windows-images we make sure to set the same property on the image. Openstack will then make sure to boot VM's based on that image on one of the hosts in the host-aggregate. If none of the hosts in the aggregate is able to fullfill the request the VM will fail to be scheduled.

Do not fill windows-hosts unless all other hosts are full

Limiting some images to a set of hosts give us the possibility that these images cannot be booted when these hosts are full of other machines. This would end up in a situation where we cannot schedule new Windows-VM's even though we have plenty of space left for it on other nodes. To avoid this we can add metadata to the host-aggregate for windows-compute setting a very low weight on the host. This would make sure that we only use the windows-hosts if the VM's cannot be placed elsewhere (because all other hypervisors are full, or because we are scheduling a windows VM).

The following property should be set on the aggregate:

  • ram_weight_multiplier='-2000'

Schedule GPU-instances to GPU-equipped compute-nodes

To make sure that we schedule GPU-based flavors to GPU-equipped compute-nodes (and general-purpose VM's to general-purpose compute-nodes) we employ the AggregateInstanceExtraSpecsFilter. We create host-aggregates with metadata under the key "node_type" describing what kind of compute-node this is. For instance we have the following host-aggregate in SkyHiGh:

Code Block
$ openstack aggregate show general-purpose
+-------------------+------------------------------------------------------------------------------------------------------+
| Field             | Value                                                                                                |
+-------------------+------------------------------------------------------------------------------------------------------+
| availability_zone | nova                                                                                                 |
| created_at        | 2019-05-06T11:59:45.000000                                                                           |
| deleted           | False                                                                                                |
| deleted_at        | None                                                                                                 |
| hosts             | compute01, compute02, compute03, compute04, compute05, compute06, compute07, compute08, compute09,   |
|                   | compute10, compute11, compute12, compute13, compute14, compute15, compute16, compute17               |
| id                | 5                                                                                                    |
| name              | general-purpose                                                                                      |
| properties        | node_type='general', os_type='any'                                                                   |
| updated_at        | 2020-06-25T08:05:31.000000                                                                           |
+-------------------+------------------------------------------------------------------------------------------------------+

We then tag flavors which should be able to run on this compute-node with the same value. In skyhigh the m1.medium is for instance considered to be a general-purpose flavor, and should thus be placed on a node with the type 'general':

Code Block
$ openstack flavor show m1.medium
+----------------------------+---------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                       |
+----------------------------+---------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                       |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                           |
| access_project_ids         | None                                                                                        |
| disk                       | 40                                                                                          |
| id                         | 1ff86526-c425-4b48-87ac-83826e1b7136                                                        |
| name                       | m1.medium                                                                                   |
| os-flavor-access:is_public | True                                                                                        |
| properties                 | aggregate_instance_extra_specs:node_type='general', hw:cpu_cores='1', hw:cpu_sockets='2',   |
|                            | hw:cpu_threads='1', hw_rng:allowed='true', hw_rng:rate_bytes='24',                          |
|                            | hw_rng:rate_period='5000', quota:disk_read_iops_sec='300', quota:disk_write_iops_sec='300'  |
| ram                        | 8192                                                                                        |
| rxtx_factor                | 1.0                                                                                         |
| swap                       |                                                                                             |
| vcpus                      | 2                                                                                           |
+----------------------------+---------------------------------------------------------------------------------------------+

Our strategy to make sure all placement is correct is to tag all general-purpose flavors with 'general', and then create specific host-aggregates for specific flavors. For instance GPU-enabled flavors would be tagged with another key, which also wil be used to tag the compute-node with these GPUs installed.