...
Time | Event |
---|---|
16.12.23 - 20:22 | The broken floor tile was discovered |
16.12.23 - ca. 23:00 | Agreed that we should remove some weight from R3, and contact help on Monday |
18.12.23 - ca. 09:00 | Contacted a company that will assess the damage, and come up with a plan to fix the floor |
18.12.23 - ca. 12:00 | Placed a steel beam under R3, to support it. Migrated all VMs from five of the compute nodes in R3, and removed them from the rack - meaning we are currently running on reduced capacity. |
20.12.23 | Visit from the carpenter company. Made an initial plan for what needed to be done. Decided that the steel beam would suffice for support. Little to no measured further "sinking". |
17.01.24 - 10:00 | Meeting with the carpenter. A plan was made for repairing the floor. They will build support framing between all the floor tile legs, and replace all necessary tiles. The carpenters will need two days, and we schedule two days for removing all servers, and one day to put everything back in after the floor is fixed. Meaning a total downtime of five days. |
18.01.24 - 13:00 | Received confirmation from the carpenter, that they can start the repair work on 7th of February. We accepted the offer. |
23.01.24 - 15:23 | Messaged all users about the planned downtime in week 6 |
05.02.24 - ca. 10:00 | Shutdown SkyHiGh, SkyLow and everything else. All servers has been removed from the racks. |
08.02.24 - ca. 11:00 | Carpenter work finished. |
08.02.24 - ca. 12:00 | Started the work on moving racks back in place and rewire fibre cables and environmental sensors. |
08.02.24 - 16:00 | All network infrastructure (core and rack switches) is reinstalled and confirmed working as normal. Replaced a broken PDU, and all environmental sensors are confirmed working. |
09.02.24 - 08:30 | Started to place all SkyHiGh, NBL, DSE and Hansken servers back in their racks. |
Email sent to all users
Code Block |
---|
Hei alle dere som benytter SkyHiGh eller andre servere i K001 på Gjøvik! Det har dessverre vist seg å være litt dårlig støtte for skyhigh- rackene i K001, så gulvet de rackene står på holder på å gi etter, og noen av rackene er dermed blitt litt skeive. Dette er litt uheldig, og vi har et behov for å rette litt på denne situasjonen. Vi ser oss derfor nødt til å fikse gulvet; samt forsterke litt for å unngå at dette skal skje igjen. Vi har en avtale med snekkere om at de skal fikse og utbedre, men for at de skal kunne gjøre jobben sin er vi dessverre nødt til å tømme serverrommet helt for servere. I praksis betyr dette at alle servertjenester som leveres fra K001 vil være stoppet i hele uke 6. Vi kommer til å skru av og ta ut servere 5. og 6. februar, la snekkerene jobbe 7. og 8. februar, og deretter sette ting tilbake i drift 9.(skyhigh) og 12.(resten) februar. De av dere som benytter noen av disse serverne for å levere en tjeneste til andre er selv ansvarlige for å varsle om at tjenestene kommer til å gå ned. Vi beklager ulempene dette medfører, men må samtidig be om forståelse for at dette er en ekstraordinær situasjon som faktisk bare _må_ utbedres. === Hi all SkyHiGh users, and others using servers in K001 at Gjøvik! Unfortunately, it has been discovered that there is inadequate support for the sky-high racks in K001. As a result, a few of the floor-tiles beneath these racks has broken, causing some of the racks to tilt. This is an unfortunate situation, and we need to address it promptly. Therefore, we find ourselves compelled to fix the floor and reinforce it to prevent a recurrence. We have an agreement with carpenters to carry out the necessary repairs and improvements. However, to enable them to perform their work, we regret to inform you that we need to completely empty the server room of all servers. In practical terms, this means that all server services provided from K001 will be halted throughout week 6. We will power off and remove servers on February 5th and 6th, allow the carpenters to work on February 7th and 8th, and then restore operations on February 9th (for sky-high) and February 12th (for the rest). Those of you utilizing these servers to deliver services to others are responsible for notifying them that the services will be temporarily disrupted. We apologize for any inconvenience this may cause, but we must request your understanding as this is an extraordinary situation that truly must be addressed. |
...