Introduction to Computing
This chapter covers how programs execute on modern hardware, from the fetch-decode-execute cycle to distributed GPU clusters.
- How Programs Execute: The fetch-decode-execute cycle, registers, memory hierarchy, and latency numbers every programmer should know
- CPU and GPU Architecture: CPU vs GPU design, SIMT execution, thread hierarchy, memory types, and bandwidth vs latency
- From Assembly to PyTorch: The abstraction stack: how
torch.matmulbecomes GPU instructions - High-Performance Computing: MPI, NCCL, distributed training topology, and profiling with nsys