Example Job Scripts¶

This page provides structured job script examples adapted for REPACSS. For definitions and scheduler behavior, refer to the Job Basics page. These examples are designed for CPU and GPU partitions such as zen4, h100, and standard.

Tip

Interactive jobs in GPU partitions are granted scheduling priority on REPACSS.

Warning

Each Slurm CPU is a hyperthread. To bind OpenMP threads to physical cores, use --cpu-bind=cores.

Note

To run jobs in parallel, use &, and use wait to synchronize them.

Job Types¶

Jobs on REPACSS can be submitted in two main forms:

Interactive Jobs: Real-time sessions for testing and debugging.

interactive -c 8 -p h100

Batch Jobs: Scheduled jobs submitted via script.

sbatch job.sh
sbatch -p zen4 job.sh
sbatch -p h100 job.sh

Script Templates¶

Basic Job Script with C¶

Create a file named my_program.c with the following content:

#include <stdio.h>
#include <unistd.h>

int main() {
    printf("SLURM job started.\n");
    printf("Sleeping for 60 seconds to simulate work...\n");
    sleep(60);
    printf("Job complete. Goodbye!\n");
    return 0;
}

Load the GCC module and compile the program:

module load gcc/14.2.0
gcc my_program.c -o my_program

Create a file named submit_job.sh with the following contents:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G

# Load modules
module load gcc/14.2.0

# Run program
./my_program

To determine your resource needs, refer to the Determine Resource Needs documentation.

Make the script executable and submit it using sbatch:
```
sbatch submit_job.sh
```

Python Job Script¶

Create a Python file named script.py with the following example content:

import time
import platform

print("SLURM Python job started.")
print("Running on:", platform.node())
time.sleep(10)  # Simulate workload
print("Job complete. Goodbye!")

Create a file named submit_python_job.sh with the following content:

#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_job.out
#SBATCH --error=python_job.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

# Load required modules
module load gcc

# Activate conda environment
source ~/miniforge3/etc/profile.d/conda.sh
conda activate myenv

# Run Python script
python script.py

To determine your resource needs, refer to the Determine Resource Needs documentation.

Make the script executable and submit it to SLURM:
```
sbatch submit_python_job.sh
```

GPU Job Script¶

Warning

We are currently working to make the CUDA module available system-wide for all users. In the meantime, please use CUDA via a Conda environment as described below.

Create and activate a new Conda environment

conda create --name cuda-env python=3.10 -y
conda activate cuda-env

Install the CUDA Toolkit with nvcc support

conda install -c nvidia cuda-toolkit=12.9

Install a compatible GCC toolchain (GCC 11)

conda install -c conda-forge gxx_linux-64=11

Create a sample CUDA program: gpu_program.cu

#include <stdio.h>
#include <cuda_runtime.h>

__global__ void hello_from_gpu() {
    printf("Hello from GPU thread %d!\n", threadIdx.x);
}

int main() {
    printf("Starting GPU job...\n");
    hello_from_gpu<<<1, 64>>>();

    cudaError_t err = cudaGetLastError();
    if (err != cudaSuccess) {
        fprintf(stderr, "Kernel launch failed: %s\n", cudaGetErrorString(err));
        return 1;
    }

    err = cudaDeviceSynchronize();
    if (err != cudaSuccess) {
        fprintf(stderr, "CUDA error after kernel: %s\n", cudaGetErrorString(err));
        return 1;
    }

    printf("GPU job finished.\n");
    return 0;
}

Compile the CUDA program for NVIDIA H100 GPUs (sm_90)

nvcc -arch=sm_90 \
  -ccbin "$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++" \
  -I"$CONDA_PREFIX/targets/x86_64-linux/include" \
  -L"$CONDA_PREFIX/targets/x86_64-linux/lib" \
  -o gpu_program gpu_program.cu

Create the SLURM job script: gpu_job.slurm

#!/bin/bash
#SBATCH --job-name=gpu_hello
#SBATCH --output=gpu_hello.out
#SBATCH --error=gpu_hello.err
#SBATCH --partition=h100
#SBATCH --gres=gpu:nvidia_h100_nvl:1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --time=00:05:00

source ~/miniforge3/etc/profile.d/conda.sh
conda activate cuda-env

./gpu_program

To determine your resource needs, refer to the Determine Resource Needs documentation.

Submit the job to SLURM
```
sbatch gpu_job.slurm
```

Job Management¶

Submission¶

sbatch job.sh                       # Submit job
sbatch --array=1-10 job.sh          # Submit job array
sbatch --dependency=afterok:12345 job.sh  # Submit with dependency

Monitoring¶

squeue -u $USER           # View user's jobs
squeue -p zen4            # View jobs on zen4 partition
squeue -p h100            # View jobs on h100 partition

Control¶

scancel 12345             # Cancel a specific job
scancel -u $USER          # Cancel all user's jobs
scancel -p zen4           # Cancel all jobs in zen4 partition

Resource Requests¶

CPU Jobs: Specify --nodes, --ntasks, --cpus-per-task, and --mem
GPU Jobs: Include --gres=gpu:1 or more as needed
Python Jobs:
See Python environment setup
Use --cpus-per-task for multi-threading
Set --mem appropriately for data requirements

For additional guidance, consult Slurm Documentation and REPACSS-specific resources.