Skip to content

Pongsb/CUDA-parallel-programming

Repository files navigation

CUDA Parallel Programming

Overview

This repository contains a curated collection of CUDA and GPU computing projects focused on parallel numerical computation, performance optimization, simulation, and multi-GPU programming. The projects demonstrate how computational workloads can be mapped to GPU architectures using CUDA kernels, thread/block organization, memory hierarchy awareness, reductions, atomic operations, stencil methods, Monte Carlo simulation, multi-GPU decomposition, and FFT-based numerical solvers.

Technical Focus

CUDA parallel programming uses GPU hardware to accelerate data-parallel and numerically intensive workloads. Instead of executing work sequentially on the CPU, CUDA programs divide computation into many lightweight threads organized into grids and blocks.

The projects in this repository cover:

  • CUDA kernel design for data-parallel computation
  • grid, block, and thread mapping
  • GPU timing with CUDA events
  • block-size tuning and runtime comparison
  • shared-memory reductions
  • atomic operations for concurrent updates
  • stencil-based iterative numerical solvers
  • Jacobi iteration for Poisson and heat-diffusion problems
  • Monte Carlo simulation and random sampling
  • single-GPU and multi-GPU workload partitioning
  • domain decomposition and boundary exchange
  • cuFFT-based Fourier-space numerical solving
  • result validation, runtime scaling, and experiment documentation

Skills & Technologies

CUDA C++ Python Bash GPU Computing Numerical Computing Multi-GPU Monte Carlo Performance

Core skills: CUDA, C++, GPU Computing, Parallel Programming, Thread/Block Mapping, CUDA Events, Shared Memory, Atomic Operations, Parallel Reduction, Stencil Computation, Jacobi Iteration, Poisson Solver, Heat Diffusion, Multi-GPU Programming, Domain Decomposition, Monte Carlo Simulation, cuFFT, Numerical Computing, Performance Benchmarking, Python, Bash, Technical Documentation

Project Directory

No. Project Topic Purpose Main Concepts Skills / Tags
01 01-modified-matrix-addition Modified Matrix Addition Implement a basic CUDA matrix operation and evaluate how block configuration affects runtime 2D grid/block mapping, element-wise kernels, CUDA event timing, block-size tuning CUDA, C++, 2D Kernel, GPU Timing, Performance Tuning
02 02-parallel-reduction-trace Parallel Reduction Trace Implement a CUDA reduction workflow for large-array summation or trace-style computation shared-memory reduction, strided access, block-level partial sums, final accumulation CUDA, Reduction, Shared Memory, Parallel Sum, Benchmarking
03 03-poisson-3d-jacobi 3D Poisson Jacobi Solver Solve a 3D Poisson problem using iterative Jacobi updates on a structured grid stencil computation, zero boundary condition, convergence checking, numerical validation CUDA, Jacobi, Poisson Equation, Stencil, Numerical Solver
04 04-multi-gpu-dot-product Multi-GPU Dot Product Partition a vector dot product across multiple GPUs and combine partial results multi-GPU partitioning, per-GPU reduction, host-side accumulation, CUDA timing CUDA, Multi-GPU, Dot Product, Reduction, Performance Comparison
05 05-multi-gpu-heat-diffusion Multi-GPU Heat Diffusion Simulate 2D heat diffusion using Jacobi iteration with domain decomposition halo exchange, boundary conditions, multi-GPU subdomains, iterative stencil updates CUDA, Multi-GPU, Heat Diffusion, Domain Decomposition, Jacobi
06 06-exponential-histogram Exponential Histogram Compare CPU and GPU histogram implementations using atomic updates exponential random samples, histogram bins, global atomics, shared-memory atomics CUDA, Histogram, Atomic Operations, Shared Memory, Speedup
07 07-monte-carlo-10d-integration Monte Carlo 10D Integration Estimate a high-dimensional integral using CPU and CUDA Monte Carlo methods random sampling, importance sampling, Metropolis sampling, convergence analysis CUDA, Monte Carlo, High-Dimensional Integration, Sampling, Numerical Methods
08 08-ising-model-monte-carlo 2D Ising Model Monte Carlo Simulate a 2D Ising model using Metropolis updates on GPU checkerboard update, toroidal lattice, spin simulation, Monte Carlo production runs CUDA, Monte Carlo, Ising Model, Metropolis, GPU Simulation
09 09-poisson-3d-cufft 3D Poisson Solver with cuFFT Solve a 3D Poisson equation in Fourier space using cuFFT FFT transform, Fourier-space Green's function, zero-mode handling, runtime scaling CUDA, cuFFT, Poisson Solver, FFT, Numerical Computing

Skill Coverage Map

Skill / Concept 01 Matrix 02 Reduction 03 Jacobi Poisson 04 Multi-GPU Dot 05 Heat Diffusion 06 Histogram 07 MC Integration 08 Ising Model 09 cuFFT Poisson
CUDA kernel programming
C++ systems programming
Grid/block/thread mapping
CUDA event timing
Block-size tuning
Shared memory
Atomic operations
Parallel reduction
Stencil computation
Jacobi iteration
Poisson equation solving
Heat diffusion simulation
Multi-GPU programming
Domain decomposition
Monte Carlo simulation
Random sampling methods
cuFFT / FFT-based solver
Numerical validation
Experiment automation

Project Highlights

01. Modified Matrix Addition

This project implements an element-wise CUDA matrix operation where each output value is computed independently. It demonstrates basic GPU kernel design and 2D indexing for matrix traversal.

Key demonstrated concepts:

  • one GPU thread per matrix element
  • 2D grid and block configuration
  • CUDA event-based timing
  • block-size performance comparison

02. Parallel Reduction Trace

This project implements a CUDA reduction workflow for large-array summation or trace-style computation. It demonstrates how partial sums can be computed inside blocks and then combined to obtain a final result.

Key demonstrated concepts:

  • shared-memory reduction
  • strided global-memory access
  • block-level partial sums
  • reduction performance tuning

03. 3D Poisson Jacobi Solver

This project solves a 3D Poisson equation using Jacobi iteration. The implementation represents a structured 3D grid and repeatedly applies stencil updates until the configured convergence or iteration condition is reached.

Key demonstrated concepts:

  • 3D grid representation
  • stencil-based update
  • Jacobi iteration
  • zero Dirichlet boundary condition
  • radial-average validation against expected potential behavior

04. Multi-GPU Dot Product

This project computes a vector dot product using multiple GPUs. The input vector is partitioned across devices, each GPU computes a partial dot product, and the host combines the partial results.

Key demonstrated concepts:

  • multi-GPU workload partitioning
  • per-device memory allocation
  • partial reduction
  • host-side final accumulation
  • single-GPU and multi-GPU timing comparison

05. Multi-GPU Heat Diffusion

This project simulates 2D heat diffusion using Jacobi iteration and multi-GPU domain decomposition. Each GPU updates a subdomain, and neighboring boundary data is exchanged to maintain consistency.

Key demonstrated concepts:

  • 2D stencil computation
  • fixed boundary conditions
  • domain decomposition
  • halo/boundary exchange
  • single-GPU and two-GPU comparison

06. Exponential Histogram

This project builds histograms from samples generated from an exponential distribution. It compares CPU and GPU versions, including GPU implementations that use global-memory and shared-memory atomic operations.

Key demonstrated concepts:

  • histogram binning
  • exponential random distribution
  • global-memory atomics
  • shared-memory atomics
  • CPU vs GPU speedup comparison

07. Monte Carlo 10D Integration

This project estimates a 10-dimensional integral using Monte Carlo techniques. Multiple sampling strategies are compared to evaluate convergence behavior and GPU acceleration.

Key demonstrated concepts:

  • high-dimensional numerical integration
  • simple Monte Carlo sampling
  • direct-inversion importance sampling
  • Metropolis sampling
  • CPU vs CUDA comparison

08. 2D Ising Model Monte Carlo

This project simulates the 2D Ising model on a toroidal lattice using the Metropolis algorithm. The checkerboard update pattern helps avoid conflicting updates between neighboring spins.

Key demonstrated concepts:

  • Ising spin simulation
  • toroidal boundary condition
  • Metropolis Monte Carlo update
  • checkerboard update scheme
  • GPU-based production runs

09. 3D Poisson Solver with cuFFT

This project solves a 3D Poisson problem using an FFT-based method with cuFFT. The charge distribution is transformed into Fourier space, the potential is computed using a Green's-function-style formulation, and the result is transformed back to real space.

Key demonstrated concepts:

  • 3D FFT workflow
  • cuFFT integration
  • Fourier-space Poisson solver
  • zero-mode handling
  • runtime scaling and grid-size exploration

Repository Structure

cuda-parallel-programming/
├── README.md
├── .gitignore
├── 01-modified-matrix-addition/
├── 02-parallel-reduction-trace/
├── 03-poisson-3d-jacobi/
├── 04-multi-gpu-dot-product/
├── 05-multi-gpu-heat-diffusion/
├── 06-exponential-histogram/
├── 07-monte-carlo-10d-integration/
├── 08-ising-model-monte-carlo/
└── 09-poisson-3d-cufft/

Typical project folders may use the following structure:

project-folder/
├── README.md
├── Makefile
├── src/
├── scripts/
└── results/

Some folders may omit directories that are not required for that specific project.

Build and Run

Each project contains its own README with project-specific build and run instructions. A typical CUDA/C++ project can be built with:

make clean
make

A typical executable run may look like:

./build/main

For projects with experiment scripts:

bash scripts/run_experiments.sh

For Python-based plotting or result summarization:

python3 scripts/plot_results.py

Reproducibility Notes

To reproduce results, use a CUDA-capable NVIDIA GPU with a compatible CUDA Toolkit version installed. Exact runtime results may vary depending on GPU model, driver version, CUDA version, CPU, memory bandwidth, and system load.

Recommended environment:

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit
  • C++ compiler compatible with nvcc
  • GNU Make
  • Python 3 for optional scripts and plots
  • Python packages such as numpy, pandas, and matplotlib when plotting scripts are used

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors