Slurm


Slurm is an "open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters." The CS research cluster uses Slurm to manage parallel and distributed CPU and GPU compute jobs.


Quick Reference

On the cluster, use the following commands to interact with Slurm:

Description Command
View partition info sinfo
View node info for all partitions sinfo --Node
View node info for a specific partition sinfo --partition=PARTITION --long --Node
View job queue squeue
View job queue for a specific partition squeue --partition=PARTITION
View job queue for a specific user squeue --user=USER
View job queue for the current user squeue --me
Schedule a job srun COMMAND
Schedule a job on a specific partition srun --partition=PARTITION COMMAND
Schedule a job to run on a specific number of nodes srun --nodes=N COMMAND
Schedule a job to run on a specific number of cores srun --ntasks=T COMMAND
Schedule a job to run on a specific number of cores per node srun --nodes=N --ntasks-per-node=T COMMAND
Schedule a job to run at a specified time srun --begin=TIME COMMAND
Schedule a job with a specified name srun --job-name=NAME COMMAND
Cancel a queued job scancel ID
View job accounting info sacct --format=Start,JobName,ReqCPUS,ElapsedRaw,State

This page was last modified on 2020-10-03 at 22:00:36.

Copyright © 2015–2020 George Fox University. All rights reserved.