Advanced ML Math
This chapter covers the mathematical foundations behind modern ML architectures that require mathematical tools beyond the classical toolkit.
- Attention and Transformers -- Scaled dot-product attention, multi-head attention, positional encoding, FlashAttention
- Diffusion Models -- Forward noise process, reverse denoising, DDPM loss, score matching
- Graph Neural Networks -- Graph Laplacian, spectral theory, message passing, GCN/GAT
- Optimal Transport -- Wasserstein distance, Kantorovich duality, Sinkhorn algorithm