Probability and Statistics
Probability is the other foundational language of machine learning. Models define distributions, training maximizes likelihood, and generalization is a statistical guarantee.
These five sections build on one another. The basics fix the vocabulary of events, conditioning, and expectation; distributions give the concrete families that models assume; the likelihoods from those distributions feed Bayesian inference, where priors and data combine into posteriors; information theory measures the distances between distributions that show up as training objectives; and statistical learning theory turns all of this into guarantees about how well a fitted model will generalize.
- Probability Basics: Axioms, conditional probability, Bayes' theorem, expectation, variance
- Distributions: Bernoulli, Gaussian, Categorical (and softmax), multivariate Gaussian, mixture models
- Bayesian Inference: Prior, likelihood, posterior, MAP vs MLE, variational inference
- Information Theory: Entropy, cross-entropy, KL divergence, mutual information
- Statistical Learning Theory: PAC learning, VC dimension, generalization bounds, bias-variance decomposition