Core networking issue

Post-mortem

One of the Ethernet switches that provide the core backbone network on Sherlock has experienced a crash last night, which had consequences on the cluster’s internal connectivity:

  • the scheduler may have been unresponsive at times,
  • access to the $HOME and $GROUP_HOME file systems may have been disrupted,
  • some network connectivity issues to both login nodes, DTNs and to the outside may have occurred

Physical intervention was required, and the issue has been fixed at 8:40pm last night. All systems have now returned to normal.

Resolvido
Avaliado

Network connectivity issues that prevent the scheduler to operate properly have been reported.

4 Serviços afetados: