Skip to main content

CUDA and GPU Programming

This chapter covers GPU programming from first principles: writing CUDA kernels, understanding the memory model, optimization techniques, and higher-level tools like Triton.

  • Your First Kernel -- Hello world, thread indexing, vector addition, compiling with nvcc
  • Memory Model -- Global, shared, and register memory, coalescing, bank conflicts, occupancy
  • Optimization -- Warp divergence, tiling, reduction, streams, profiling
  • Triton -- OpenAI Triton, block-level programming, fused softmax, matmul kernel
  • Custom PyTorch Ops -- C++ extensions, CUDA kernels from Python, torch.library