Slurm is an "open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters." The CS research cluster uses Slurm to manage parallel and distributed CPU and GPU compute jobs.
On the cluster, use the following commands to interact with Slurm:
Description | Command |
---|---|
View partition info | sinfo |
View node info for all partitions | sinfo --Node |
View node info for a specific partition | sinfo --partition=PARTITION --long --Node |
View job queue | squeue |
View job queue for a specific partition | squeue --partition=PARTITION |
View job queue for a specific user | squeue --user=USER |
View job queue for the current user | squeue --me |
Schedule a job | srun COMMAND |
Schedule a job on a specific partition | srun --partition=PARTITION COMMAND |
Schedule a job to run on a specific number of nodes | srun --nodes=N COMMAND |
Schedule a job to run on a specific number of cores | srun --ntasks=T COMMAND |
Schedule a job to run on a specific number of cores per node | srun --nodes=N --ntasks-per-node=T COMMAND |
Schedule a multi-threaded program to run once, but with a specific number of CPU cores reserved | srun --ntasks=1 --cpus-per-task=K COMMAND |
Schedule a job to run at a specified time | srun --begin=TIME COMMAND |
Schedule a job with a specified name | srun --job-name=NAME COMMAND |
Schedule a job to run with a specific type of generic resource (e.g., a GPU) | srun --gres=gpu COMMAND |
Schedule a job to run with a specific number of generic resources (e.g., two GPUs) | srun --gres=gpu:2 COMMAND |
Schedule a job to run on a specific generic resource (e.g., the GPU in slot 0) | srun --gres=gpu:slot0 COMMAND |
Cancel a queued job | scancel ID |
This page was last modified on 2024-08-09 at 16:40:03.
George Fox University · 414 N Meridian St · Newberg, Oregon 97132 · 503-538-8383
Copyright © 2015–2025 George Fox University. All rights reserved.