SBatch Use Cases

Job Arrays

Use Cases

Running multiple instances of the same job with different arguments without a need to communicate between jobs.

Example

#!/bin/bash
#SBATCH --job-name=array-example
##SBATCH --array=1-5 # Submits a job array with index values between 1 and 5
#SBATCH --array=1,3,5,7 # Submits a job array with index values of 1,3,5,7
##SBATCH --array=1-7:2 # Submits a job array with index values between 1 and 7 with steps of 2 (1,3,5,7)
##SBATCH --array=1-5%2 # Submits a job array with index values between 1 and 5 but limits the number of simultaneously running tasks for this job array to 4
#SBATCH --partition=short-cpu
#SBATCH --output=%A/out_%a.out # The output file will be in a folder with the name jobId and will have the form out_arrayIndex
#SBATCH --error=%A/error_%a.err # The error file will be in a folder with the name jobId and will have the form error_arrayIndex
#SBATCH --ntasks=1
#SBATCH --time=00:05:00

module load python39

srun python3 ~/examples/array-example/array-example.py $SLURM_ARRAY_JOB_ID $SLURM_ARRAY_TASK_ID # Calls a python script with the arguments jobId and arrayIndex

MPI

Use Cases

Process based parallelization where the different process can communicate with each other by passing messages. It works on both distributed and shared memory systems.

Example

#!/bin/bash

#SBATCH --job-name=MpiBinarySearch
#SBATCH --output=%A/mpi-bin.out
#SBATCH --error=%A/mpi-bin.err
#SBATCH --ntasks=10
#SBATCH --time=00:05:00

module load openmpi4/gcc
module load gcc

srun mpicc mpi-binary-search.c -o mpi-binary-search

srun --mpi=pmix --export=ALL,OMPI_MCA_btl_openib_allow_ib=true,OMPI_MCA_btl=openib,self,sm ./mpi-binary-search

OpenMP

Use Cases

Thread based parallelization where the different threads share memory.

Example

#!/bin/bash
#SBATCH --job-name=omp-bin-search
#SBATCH --output=%A/omp-bin.out
#SBATCH --error=%A/omp-bin.err
#SBATCH --time=00:05:00
#SBATCH --cpus-per-task=4

module load openmpi/gcc/64

srun gcc -fopenmp openmp-binary-search.c -o openmp-binary-search

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun --mpi=pmi2 ./openmp-binary-search

CUDA

Use Cases

CUDA (Compute Unified Device Architecture) is NVIDIA’s parallel computing platform and programming model for general-purpose computing on GPUs. It allows direct access to GPU hardware for high-performance computing tasks including scientific simulations, machine learning, image processing, and numerical computations. CUDA is particularly effective for problems that can be parallelized across thousands of GPU cores. CUDA works with a multitude of languages including C, C++, FORTRAN, Python, and Julia, but C++ and Python are the most common to use on Riviera. PyTorch is what is typically used for interacting with CUDA via Python and CUDA C++ is how it is interacted with via C++.

Overview

CUDA is already installed on Riviera via the modules’ system. CUDA is typically interfaced with via CUDA C++ which is a minimal set of extensions to the C++ language and runtime library. CUDA C++ is compiled with NVCC. CUDA C++ adds functions called kernels which can be executed N times in parallel by N different CUDA threads. Additionally, memory must be directly managed for the GPU with cudaMalloc, cudaMemcpy, and cudaFree. If data is not transferred directly to the GPU with one of these methods the GPU cannot access it.

Examples

#!/bin/bash -l
#SBATCH --job-name=cuda-example
#SBATCH --partition=short-gpu
#SBATCH --output=out.log
#SBATCH --error=error.log
#SBATCH --time=00:01:00
#SBATCH --ntasks=1

module load cuda12.2/blas/12.2.2
module load cuda12.2/fft/12.2.2
module load cuda12.2/toolkit/12.2.2

srun nvcc vector_add.cu -o vector_add # Compiles the CUDA C++ program.
srun time ./vector_add # Runs and times the cuda program

PyTorch

Use Cases

PyTorch is used for GPU Processing using Python. It has built in support for CUDA and can be used for general GPU compute using CUDA operations or for machine learning training using libraries designed for assisting in training. It makes use of tensor objects to achieve its computation, more can be read on the PyTorch website and documentation here.

Installing

Due to PyTorch’s size we need to create a new tmp directory before we can install it, otherwise it will run into issues with the tmp directory becoming full before PyTorch can be fully installed, causing the installation to halt. Luckily, creating a new temp directory is easy:

[username@login001]$ mkdir temp_directory
[username@login001]$ export TMPDIR=~/temp_directory

Once that is completed PyTorch can be installed in a python virtual environment with pip as specified on this page making sure CUDA 11.8 is selected.

Examples

#!/bin/bash
#SBATCH --job-name=pytorch-cuda
#SBATCH --partition=short-gpu
#SBATCH --output=%A/out.out
#SBATCH --error=%A/err.err
#SBATCH --ntasks=1
#SBATCH --time=00:05:00

module load python39
module load cuda12.2/blas
module load cuda12.2/fft
module load cuda12.2/toolkit

source pytorch/bin/activate

srun python3 pytorch-cuda.py

deactivate
#!/bin/bash
#SBATCH --job-name=pytorch-stream
#SBATCH --partition=short-gpu
#SBATCH --output=%A/out.out
#SBATCH --error=%A/err.err
#SBATCH --ntasks=1
#SBATCH --time=00:05:00

module load python39
module load cuda12.2/blas
module load cuda12.2/fft
module load cuda12.2/toolkit

source pytorch/bin/activate

srun python3 pytorch-stream.py

deactivate