Slurm controller

Scheduler issues

Postmortem

The scheduler issue has been identified (a deadlock in the backfill scheduler) and we’re working with the Slurm developers on diagnosing and fixing the issue.

Resolved
Assessed

This issue was opened retrospectively.

The Slurm controller suffered from an issue last night, which prevented client commands (such as sbatch, squeue, sinfo, sh_part) to operate normally. Scheduling of new jobs was paused during that time as well, but already running jobs continued to run normally.

Things are back to normal now. If you still encounter issues, please report them to srcc-support@stanford.edu