Page History

...

Timestamp	Event
17.05.23 07:23	compute108 rebooted on its own with "CPU machine check error"
22.05.23 19:03	compute308 rebooted on its own with "CPU machine check error"
23.05.23 ~08:00	The unexpected reboots were discovered by the SkyHiGh team. The VMs of both compute nodes were migrated to other hosts, and taken out of production
23.05.23 ~09:00	Contacted Dell pro-support to get assistance
24.05.23 15:58	Dell suggests a new BIOS for compute108
25.05.23 09:43	compute108 now has the recommended BIOS, and all VMs has been migrated back. We are now waiting to see if this actually fixed the problem. Be aware that we now may experience a new uncontrolled reboot..
02.06.23	compute308 got its motherboard and a CPU replaced. Dell's recommended BIOS was also installed.
05.06.23 08:05	compute308 was put back into production, and all VMs were migrated back.
26.01.24 06:07	compute108 "finally" failed again with "CPU machine check error", and performed an uncontrolled reboot.
28.01.24 11.01	compute308 failed again with the same error, despite having had its motherboard and CPU replaced in June..
30.01.24 ~09:00	Migrated all VMs off compute108, to make it ready for a new motherboard replacement. Replacement will happen on February 2nd
30.01.24 13:47	Contacted Dell about compute308. Awaiting response. Meanwhile the node is back in production
31.01.24 15:25	Compute108 got its motherboard replaced, and was put back into production. Migrated all VMs back to it.
12.02.24 08:51	Upgraded BIOS, and sent new logs from compute308 to Dell as requested.
16.02.24 04:05	Compute108 failed yet again with CPU 2 Machine Check Error. Dell has been contacted, and the server will be taken out of production
19.02.24	Dell decided to replace the CPU in compute108. Will be done on the 22nd of February
22.02.24 14:20	compute108 got a new CPU. Server is back in production, and all VMs will be migrated back to it
05.03.24 06:43	compute308 got a new CPU 2 machine check error and a following sudden reboot. The node will be taken out of production. Dell ProSupport has been contacted once again.

Page tree

Versions Compared

Old Version 17

New Version Current

Key