Vectors and Matrices
Linear algebra is the mathematical foundation of machine learning. Every operation in a neural network (matrix multiplications in linear layers, dot products in attention, projections in dimensionality reduction) is a linear algebra operation. The objects are vectors (representing data points, embeddings, gradients) and matrices (representing transformations, weight matrices, covariance structures). This chapter introduces these objects, the operations defined on them, and the geometric intuitions that make them powerful tools for ML.
Vector Spaces
- Closure: and for all ,
- Associativity:
- Commutativity:
- Additive identity: There exists such that
- Additive inverse: For each , there exists such that
- Scalar distributivity:
- Vector distributivity:
- Scalar associativity:
- Scalar identity:
The standard example is , the space of -tuples of real numbers.
In machine learning, a data point is a vector where is the number of features. A dataset of points is a matrix where each row is a data point.
Dot Product and Inner Products
where is the angle between them (elsewhere in this section, also denotes regression parameters; context disambiguates). More generally, an inner product is any function that is:
- Bilinear: linear in each argument
- Symmetric:
- Positive definite: for all , with equality iff
- Cosine similarity: measures angular similarity between embeddings. Used for retrieval, clustering, and nearest-neighbor search.
- Attention scores: is a scaled dot product between query and key vectors.
- Kernel methods: Replace with a kernel function that implicitly computes the dot product in a higher-dimensional feature space.
- Linear layers: computes dot products (one per output dimension) between the weight rows and the input.
Drag the tips of vectors a (blue) and b (red) to explore how the dot product, magnitudes, and angle between them change. The green segment shows the projection of b onto a.
Norms
- Positive definiteness:
- Homogeneity:
- Triangle inequality:
The norm is for .
| Norm | Formula | Unit Ball Shape | ML Application |
|---|---|---|---|
| (counting) | Cross-polytope corners | Sparsity (not a true norm) | |
| (Manhattan) | $\sum_i | x_i | $ |
| (Euclidean) | Circle (sphere) | Ridge/L2 regularization, distances | |
| (max) | $\max_i | x_i | $ |
| Frobenius | n/a | Matrix regularization, | |
| Spectral | n/a | Spectral normalization, Lipschitz bounds |
Orthogonality and Projection
The residual is orthogonal to (this is the defining property of orthogonal projection).
More generally, the projection onto a subspace spanned by the columns of :
- If has orthonormal columns ():
- For general :
The matrix is the projection matrix. It satisfies (idempotent) and (symmetric).
First the inner products: and . The scalar coefficient is , so
Check the defining property: the residual should be orthogonal to . Indeed , confirming the result.
Worked run on two vectors. Take and . Normalize the first: , so .
Now subtract the projection of onto . The coefficient is , giving
Its norm is , so . Check orthogonality: , as required.
This is the foundation of QR decomposition (), which is used for solving linear systems, eigenvalue computation, and orthogonalizing weight matrices.
Linear Independence and Basis
A basis for a vector space is a maximal linearly independent set that spans . Every vector in can be uniquely written as a linear combination of basis vectors. The number of vectors in any basis is the dimension .
- is full rank iff
- For square matrices: is invertible is full rank
- Rank-nullity theorem: where
The Four Fundamental Subspaces
- Column space : dimension . The range of :
- Row space : dimension
- Null space : dimension . Solutions to
- Left null space : dimension
These subspaces satisfy: and . Together they form orthogonal direct-sum decompositions: and .
Notation Summary
| Symbol | Meaning |
|---|---|
| -dimensional real vector space | |
| Dot product (inner product) | |
| General inner product | |
| norm | |
| Frobenius norm | |
| Spectral norm (largest singular value) | |
| Angle between vectors, or regression parameters | |
| Kronecker delta: if , otherwise | |
| Mahalanobis inner product | |
| Covariance matrix and mean vector | |
| Projection of onto | |
| Projection matrix (idempotent: ) | |
| Column space of | |
| Null space (kernel) of | |
| Rank of | |
| Dimension of null space | |
| Dimension of vector space | |
| Trace (sum of diagonal entries) | |
| Determinant |