...
Openvswitch is upgraded on the three GPU nodes in question, and has been rebooted. As a natural side-effect, all VMs running on these servers was rebooted as well.
Event log
Time | Event |
---|---|
28.02.23 - 06:11 | gpu304 lost network connectivity |
28.02.23 - 06:28 | gpu302 lost network connectivity |
28.02.23 - 06:35 | gpu301 lost network connectivity |
28.02.23 - 08:00 | SkyHiGh operators arrived at work, and started working on the issue |
28.02.23 - 09:24 | All three affected nodes was fixed and |
returned to production |