ML Systems
Building and deploying ML models at scale requires understanding distributed training, mixed precision, inference optimization, and profiling.
- Distributed Training -- DDP, model parallelism, FSDP, DeepSpeed
- Mixed Precision -- FP32/FP16/BF16/FP8, loss scaling, torch.amp
- Inference Optimization -- KV-cache, quantization, speculative decoding, vLLM
- Profiling -- torch.profiler, nsys, ncu, roofline model