Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Ongoing incident

Incident description

The Nvidia GRID license server (nvidiadls02.it.ntnu.no) we use to serve VGPU licenses for GPU-enabled VMs in all of NTNUs Openstack platforms has been reinstalled without anyone telling us. This is a result of missing documentation from NTNU IT's side. Due to the lack of documentation, the engineer thought that the server was not in use, and could be reinstalled without bothering any users.

Impact

New GPU VMs will not be able to retrive a license, and the vGPU will not work. Running VMs will over time lose their license, and will lose it upon a reboot.

...

TimeEvent
15.03.24The server was reinstalled by NTNU IT
19.03.24 - 13:49We discovered that new GPU VMs was no longer able to aquire a license - and a few minutes later it became obvious that the server had been reinstalled
19.03.24 - 14:06The engineer that was involved in setting this up in June last year was contacted. Admits that he has indeed reinstalled this server.
19.03.24 - 15:36The license server has been reconfigured, and is now working again. All running VMs must download a new client configuration token to be able to acquire/renew the license
19.03.24 - 16:07All affected users has been informed by email.

Implemented fix

The license server has been reconfigured from scratch. This means that all existing users/running VMs must download a new client configuration token in order to acquire/renew the license. This is done by running the following commands as root inside the VM:

...