Advanced ML Math

Modern ML architectures reach for tools beyond linear algebra and basic calculus: spectral graph theory, stochastic differential equations, and optimal transport. This chapter develops those foundations and connects each to the architecture that motivates it.

Prerequisites. These sections assume the core math built up earlier in this book: linear algebra (eigendecomposition and SVD in particular), calculus and optimization, and probability and statistics. The four sections below are largely independent, so you can read them in any order.

Attention and Transformers: Scaled dot-product attention, multi-head attention, positional encoding, FlashAttention
Diffusion Models: Forward noise process, reverse denoising, DDPM loss, score matching
Graph Neural Networks: Graph Laplacian, spectral theory, message passing, GCN/GAT
Optimal Transport: Wasserstein distance, Kantorovich duality, Sinkhorn algorithm