Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TimestampEvent
17.05.23 07:23compute108 rebooted on its own with "CPU machine check error"
22.05.23 19:03compute308 rebooted on its own with "CPU machine check error"
23.05.23 ~08:00The unexpected reboots were discovered by the SkyHiGh team. The VMs of both compute nodes were migrated to other hosts, and taken out of production
23.05.23 ~09:00Contacted Dell pro-support to get assistance
24.05.23 15:58Dell suggests a new BIOS for compute108
25.05.23 09:43compute108 now has the recommended BIOS, and all VMs has been migrated back. We are now waiting to see if this actually fixed the problem. Be aware that we now may experience a new uncontrolled reboot..
02.06.23compute308 got its motherboard and a CPU replaced. Dell's recommended BIOS was also installed.
05.06.23 08:05compute308 was put back into production, and all VMs were migrated back.
26.01.24 06:07compute108 "finally" failed again with "CPU machine check error", and performed an uncontrolled reboot.