Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Create a host aggregate with name gpu-<gpu-model>-<gpu-memory>.
    • For our V100, this will be gpu-v100-8g
  • Add the custom metadata: node_type = <same name as the host aggregate>

Trait

To support multiple VGPU types, the GPU resource providers needs to be tagged with a custom trait, that says something about which VGPU type they provide. We do this on all servers for consistency, even if they are not supposed to support multiple types

Code Block
export OS_PLACEMENT_API_VERSION=1.6

# Create a new trait
openstack trait create CUSTOM_<GPU-MODEL>_<NN>G
# example name: CUSTOM_A100_20G

# Add the trait to a corresponding resource provider
openstack resource provider trait set --trait CUSTOM_A100_20G <resource provider uuid>

# To get the uuid for the above command, look in
openstack resource provider list
# And find the resource provider for a given PCI device, they're typically named something like this: gpu02.infra.skyhigh.iik.ntnu.no_pci_0000_e2_00_4


Flavor

Finally, we need a flavor for each VGPU type, that will ask for a node in the correct host aggregate, and with the correct trait

  • Create a flavor with the name gpu.<gpu-model>.<gpu-memory>
    • For our V100, this will be gpu.v100.8G
  • Add the custom metadata: resources:VGPU=1
  • Add the custom metadata: aggregate_instance_extra_specs:node_type = gpu-<gpu-model>-<gpu-memory>
    • For our V100, the metadata is: aggregate_instance_extra_specs:node_type = gpu-v100-8g
  • Add the custom metadata: trait:TRAITNAME=required
    • For example: trait:CUSTOM_A100_20G=required

Rebooting/downtime on the GPU node

...