Commands Reference¶

Use this page when you remember the task but not the exact command. Commands are grouped by workflow and link back to the module where they first appear.

Getting Around¶

I want to…	Command
Show the current machine name	`hostname`
Show my username	`whoami`
Show my current directory	`pwd`
Change into a module exercise directory	`cd module-01-hpc-foundations/exercises`
List files	`ls`
List newest output files first	`ls -lt *.out \| head`
Print a file	`cat <file>`
Watch a job output file update	`tail -f <file>.out`

Cluster And Filesystem¶

I want to…	Command
Show home directory path	`echo $HOME`
Show work directory path	`echo $WORK`
Check home directory disk space	`df -h $HOME`
Check work directory disk space	`df -h $WORK`
Show memory usage	`free -h`
Show CPU details	`lscpu`
Show just the CPU model	`lscpu \| grep "Model name"`

See Module 1 for the first cluster walkthrough.

Software Modules¶

I want to…	Command
List loaded modules	`module list`
List available modules	`module avail`
Inspect the tutorial base module	`module show hpcfund`

Note

The tutorial environment is preconfigured. Do not run module purge unless an instructor tells you to.

Slurm Jobs¶

I want to…	Command
Submit a batch job	`sbatch script.sh`
See my jobs	`squeue -u $USER`
See all jobs on the tutorial partition	`squeue -p mi2101x`
See partition and node status	`sinfo -p mi2101x`
See detailed node status	`sinfo -p mi2101x -N -l`
Get details for one job	`scontrol show job <JOBID>`
See accounting info after a job finishes	`sacct -j <JOBID>`
Cancel one job	`scancel <JOBID>`
Cancel all my jobs	`scancel -u $USER`

See Module 2 for the Slurm walkthrough.

Quick Compute-Node Commands¶

I want to…	Command
Run `hostname` on a compute node	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 hostname`
Check CPU model on a compute node	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 lscpu \| grep "Model name"`
Preview ROCm GPU info on a compute node	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 rocminfo \| head -30`
Run a program on one compute node	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 ./program`
Run with 4 MPI ranks	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=4 ./program`
Run with 16 CPU cores for one task	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 --cpus-per-task=16 ./program`

Slurm Script Directives¶

Directive	Meaning
`#SBATCH --job-name=name`	Name shown in the Slurm queue
`#SBATCH --partition=mi2101x`	Use the tutorial partition
`#SBATCH --nodes=1`	Request one node
`#SBATCH --ntasks=1`	Request one task or MPI rank
`#SBATCH --cpus-per-task=16`	Request CPU cores for one task
`#SBATCH --time=10:00`	Set the job time limit
`#SBATCH --output=name_%j.out`	Write stdout to a file; `%j` becomes the job ID
`#SBATCH --error=name_%j.err`	Write stderr to a file

Python Environment¶

I want to…	Command
Create the tutorial virtual environment from the repo root	`sbatch setup/setup_venv.sh`
Create the venv from inside a module exercise directory	`sbatch ../../setup/setup_venv.sh`
Activate the venv	`source "$WORK/sc26_venv/bin/activate"`
Verify PyTorch	`python3 -c "import torch; print(torch.__version__)"`
Verify ROCm is visible to PyTorch	`python3 -c "import torch; print(torch.cuda.is_available())"`
Leave the venv	`deactivate`

C And OpenMP¶

I want to…	Command
Compile a plain C program	`gcc -O2 -o program program.c`
Compile a C program that uses math functions	`gcc -O2 -o program program.c -lm`
Compile an OpenMP program	`gcc -fopenmp -O2 -o program program.c -lm`
Set the OpenMP thread count	`export OMP_NUM_THREADS=4`
Run OpenMP with 4 threads through Slurm	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 --cpus-per-task=16 bash -c 'export OMP_NUM_THREADS=4; ./program'`

Useful OpenMP snippets from Module 3:

OpenMP construct	Purpose
`#pragma omp parallel`	Start a parallel region
`#pragma omp for`	Split loop iterations across threads
`#pragma omp parallel for`	Start threads and split a loop in one directive
`reduction(+:sum)`	Safely combine per-thread values into `sum`
`omp_get_thread_num()`	Get the current thread ID
`omp_get_num_threads()`	Get the size of the current thread team

MPI¶

I want to…	Command
Compile an MPI C program	`mpicc -O2 -o program program.c`
Compile an MPI C program using math functions	`mpicc -O2 -o program program.c -lm`
Run with 4 ranks	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=4 ./program`
Run with 16 ranks	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=16 ./program`

Useful MPI calls from Module 4:

MPI call	Purpose
`MPI_Init(&argc, &argv)`	Initialize MPI
`MPI_Finalize()`	Shut down MPI
`MPI_Comm_rank(comm, &rank)`	Get this rank’s ID
`MPI_Comm_size(comm, &size)`	Get the number of ranks
`MPI_Send(...)`	Send data to another rank
`MPI_Recv(...)`	Receive data from another rank
`MPI_Reduce(...)`	Combine values across ranks
`MPI_Bcast(...)`	Broadcast a value from one rank to all ranks

HIP And GPU Tools¶

I want to…	Command
Check whether `hipcc` is available	`which hipcc`
Show the HIP compiler version	`hipcc --version`
Show ROCm system info	`rocminfo \| head -20`
Show GPU memory usage	`rocm-smi --showmeminfo vram`
Compile a HIP program	`hipcc -O2 -o program program.cpp`
Compile with a block-size macro	`hipcc -O2 -DBLOCK_SIZE=256 -o program program.cpp`
Run a HIP program on a compute node	`srun --partition=mi2101x --nodes=1 --time=2:00 --ntasks=1 ./program`

Useful HIP calls and syntax from Module 5:

HIP item	Purpose
`hipMalloc(&ptr, bytes)`	Allocate memory on the GPU
`hipMemcpy(dst, src, bytes, dir)`	Copy data between host and GPU
`hipFree(ptr)`	Free GPU memory
`hipDeviceSynchronize()`	Wait for GPU work to finish
`hipGetLastError()`	Check for kernel launch errors
`__global__ void kernel(...)`	Declare a GPU kernel
`kernel<<<blocks, threads>>>(...)`	Launch a kernel
`threadIdx.x`	Thread ID within a block
`blockIdx.x`	Block ID within the grid
`blockDim.x`	Threads per block

AI Exercises¶

I want to…	Command
Run the inference batch script	`sbatch submit_inference.sh`
Read inference output	`cat inference_<JOBID>.out`
Run the fine-tuning batch script	`sbatch submit_finetune.sh`
Read fine-tuning output	`cat finetune_<JOBID>.out`
Load a text generation model in Python	`AutoModelForCausalLM.from_pretrained(name)`
Load a tokenizer in Python	`AutoTokenizer.from_pretrained(name)`
Move a PyTorch model to the GPU	`model.to("cuda")`
Generate text with a model	`model.generate(input_ids, ...)`

See Module 6 for inference and fine-tuning details.

AI Agent¶

I want to…	Command
Show the shared agent server URL	`cat "$WORK/sc26_agent_server_url"`
Export the agent server URL for interactive testing	`export AGENT_API_URL=$(cat "$WORK/sc26_agent_server_url")`
Run the agent exercise	`sbatch submit_agent.sh`
Launch the tutorial-provided CLI agent	`bash ../../setup/launch_aider.sh`
Launch aider from an arbitrary working directory	`bash <repo-path>/setup/launch_aider.sh`

See Module 7 for the agent exercises.

Common Output Files¶

Job or exercise	Output file pattern
Venv setup	`setup_venv_<JOBID>.out`
Slurm first job	`first-job_<JOBID>.out`
Compute-node hello	`hello-compute_<JOBID>.out`
OpenMP pi	`openmp-pi_<JOBID>.out`
MPI sum	`mpi-sum_<JOBID>.out`
HIP exercise	`hip-exercise_<JOBID>.out`
Inference	`inference_<JOBID>.out`
Fine-tuning	`finetune_<JOBID>.out`
Agent exercise	`agent_<JOBID>.out`