Maintenance extension

Resolved

The vast majority of compute nodes have been restored to service, which marks the official end of yesterday’s maintenance.
Thanks again for your patience and understanding while we were working on restoring service.

Monitoring

We’re still making progress in restoring all compute nodes to service, and we hope to have most of them up and running later today.

Monitoring

We’re re-opening the cluster to user logins and we’ve lifted the scheduler reservation, so pending jobs have started running again.

Please note that Sherlock is not in full production state now, as many nodes are still down and need more work. We’ll continue to work on the remaining nodes tomorrow, but in the meantime, users can connect, access their files and submit jobs to the scheduler.

Again, we’re very sorry about the inconvenience and appreciate your patience and understanding whilewe’re working through those problems.

Problem Identified

We’re making some progress towards a resolution, but putting nodes back into a workable state takes a considerable amount of time, and progression is slow. Currently, about half of the cluster is still unavailable, and we may have to open the cluster to users while nodes are still being worked on.

We’re continuing to work on the situation and will keep this issue updated.

Investigating

We continue working on issues that arose during the scheduled maintenance, to bring Sherlock back in production as quickly as possible. We’ll post updates as they become available.

3 Affected Services: