Introduction to Computing
This chapter covers how programs execute on modern hardware, from the fetch-decode-execute cycle to distributed GPU clusters.
- How Programs Execute -- The fetch-decode-execute cycle, registers, memory hierarchy, and latency numbers every programmer should know
- CPU and GPU Architecture -- CPU vs GPU design, SIMT execution, thread hierarchy, memory types, and bandwidth vs latency
- From Assembly to PyTorch -- The abstraction stack: how
torch.matmulbecomes GPU instructions - High-Performance Computing -- MPI, NCCL, distributed training topology, and profiling with nsys