Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TimestampEvent
12.01.23 ~08:00compute310 crashed
12.01.23 10:21User is reporting a broken disk on a VM affected by the crash
12.01.23 11:35Disk errors fixed by removing locks in ceph
30.01.23 07:13compute310 crashed again
30.01.23 08:42All VMs moved to other hosts and removed all dangling ceph locks
13.02.23

Root cause for the dangling file locks was found, and we corrected our configuration accordingly.

In the pacific release, ceph changed osd blacklist to osd blocklist, but we failed to update the permission scheme with the new command.

This stopped the compute-node to release the old file lock when it rebooted.

Footnotes:

Footnotes Display

...