Slurm


Slurm is an "open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters." The CS research cluster uses Slurm to manage parallel and distributed CPU and GPU compute jobs.


Quick Reference

On the cluster, use the following commands to interact with Slurm:

Description Command
View partition info sinfo
View node info for all partitions sinfo --Node
View node info for a specific partition sinfo --partition=PARTITION --long --Node
View job queue squeue
View job queue for a specific partition squeue --partition=PARTITION
View job queue for a specific user squeue --user=USER
View job queue for the current user squeue --me
Schedule a job srun COMMAND
Schedule a job on a specific partition srun --partition=PARTITION COMMAND
Schedule a job to run on a specific number of nodes srun --nodes=N COMMAND
Schedule a job to run on a specific number of cores srun --ntasks=T COMMAND
Schedule a job to run on a specific number of cores per node srun --nodes=N --ntasks-per-node=T COMMAND
Schedule a multi-threaded program to run once, but with a specific number of CPU cores reserved srun --ntasks=1 --cpus-per-task=K COMMAND
Schedule a job to run at a specified time srun --begin=TIME COMMAND
Schedule a job with a specified name srun --job-name=NAME COMMAND
Schedule a job to run with a specific type of generic resource (e.g., a GPU) srun --gres=gpu COMMAND
Schedule a job to run with a specific number of generic resources (e.g., two GPUs) srun --gres=gpu:2 COMMAND
Schedule a job to run on a specific generic resource (e.g., the GPU in slot 0) srun --gres=gpu:slot0 COMMAND
Cancel a queued job scancel ID

This page was last modified on 2024-08-09 at 16:40:03.

Copyright © 2015–2025 George Fox University. All rights reserved.