Ongoing incident
Incident description
The Nvidia GRID license server (nvidiadls02.it.ntnu.no) we use to serve VGPU licenses for GPU-enabled VMs in all of NTNUs Openstack platforms has been reinstalled without anyone telling us. This is a result of missing documentation from NTNU IT's side. Due to the lack of documentation, the engineer thought that the server was not in use.
Impact
New GPU VMs will not be able to retrive a license, and the vGPU will not work. Running VMs will over time lose their license, and will lose it upon a reboot.
Event log
Time | Event |
---|---|
15.03.24 | The server was reinstalled by NTNU IT |
19.03.24 - 13:49 | We discovered that new GPU VMs was no longer able to aquire a license - and a few minutes later it became obvious that the server had been reinstalled |
19.03.24 - 14:06 | The engineer that was involved in setting this up in June last year was contacted. Admits that he has indeed reinstalled this server. |