...
Timestamp | Event |
---|---|
12.01.23 ~08:00 | compute310 crashed |
12.01.23 10:21 | User is reporting a broken disk on a VM affected by the crash |
12.01.23 11:35 | Disk errors fixed by removing locks in ceph |
30.01.23 07:13 | compute310 crashed again |
30.01.23 08:42 | All VMs moved to other hosts and removed all dangling ceph locks |
13.02.23 | Root cause for the dangling file locks was found, and we corrected our configuration accordingly. In the pacific release, ceph changed This stopped the compute-node to release the old file lock when it rebooted. |
07.04.23 19:37 | compute310 crashed again (during easter of course..) and now one of the CPUs seems to be completly dead. That dead CPU is likely to be blamed for all further crashes |
08.04.23 02:42 | compute310 crashed again (during easter of course..) |
08.04.23 09:37 | compute310 crashed again (during easter of course..) |
MANY MORE TIMES | compute310 crashed again (during easter of course..) |
11.04 | Contacted Dell to replace the CPU, and migrated all VMs to a working node |
Footnotes:
Footnotes Display |
---|
...