Scheduler issues

分析 2022年09月21日 07:21 PDT

The scheduler issue has been identified (a deadlock in the backfill scheduler) and we’re working with the Slurm developers on diagnosing and fixing the issue.

解決済み 2022年09月21日 06:30 PDT

検証済み 2022年09月20日 22:30 PDT

This issue was opened retrospectively.

The Slurm controller suffered from an issue last night, which prevented client commands (such as sbatch, squeue, sinfo, sh_part) to operate normally. Scheduling of new jobs was paused during that time as well, but already running jobs continued to run normally.

Things are back to normal now. If you still encounter issues, please report them to srcc-support@stanford.edu