Applied Linear Algebra: Vectors, Matrices, and Transformations
Introduction
Linear algebra is the mathematics of linear relationships. It studies vectors, vector spaces, linear transformations, and systems of linear equations. Where calculus describes continuous change, linear algebra describes structured relationships among multiple variables. Its methods are fundamental to computer graphics, machine learning, quantum mechanics, network analysis, and virtually every quantitative discipline.
The power of linear algebra lies in its ability to reduce complex problems to manageable computations. A system of thousands of equations in thousands of unknowns can be solved efficiently using matrix methods. A high-dimensional dataset can be visualized by projecting it onto its most important directions. A linear transformation can be decomposed into scaling and rotation components that reveal its essential geometric action.
Vectors and Vector Spaces
A vector is an ordered list of numbers representing a point in space, a direction, or a collection of measurements. Vectors can be added componentwise and multiplied by scalars. These two operations define the vector space structure that underlies all of linear algebra.
Geometric and Algebraic Views
Geometrically, a two-dimensional vector is an arrow from the origin to a point in the plane. Three-dimensional vectors exist in space. Higher-dimensional vectors, though impossible to visualize directly, follow the same algebraic rules and arise naturally when data has many features — each row of a spreadsheet becomes a vector in a high-dimensional space.
The vector space ℝⁿ consists of all n-tuples of real numbers. Subspaces are subsets closed under addition and scalar multiplication. Lines through the origin are one-dimensional subspaces. Planes through the origin are two-dimensional subspaces. The column space and null space of a matrix are crucial subspaces that reveal whether a system of equations has solutions.
Linear Independence and Basis
A set of vectors is linearly independent if no vector can be expressed as a combination of the others. The maximum number of independent vectors in a space is its dimension. A basis is a linearly independent set that spans the entire space, providing a coordinate system.
Choosing the right basis simplifies problems enormously. The standard basis is natural for computation, but eigenvector bases diagonalize linear transformations, making them the preferred representation for analysis. The change-of-basis matrix transforms coordinates between different bases, a technique essential for understanding how linear transformations behave in different coordinate systems.
Matrices and Linear Transformations
A matrix is a rectangular array of numbers representing a linear transformation from one vector space to another. Multiplying a matrix by a vector applies the transformation: the result is a linear combination of the matrix columns weighted by the vector components.
Matrix Multiplication and Inverses
Matrix multiplication corresponds to composition of linear transformations. The ij entry of the product AB is the dot product of the ith row of A with the jth column of B. This operation is associative but not commutative — the order of transformations matters.
The identity matrix I leaves vectors unchanged. The inverse matrix A⁻¹ reverses the effect of A: A⁻¹A = I. A matrix is invertible precisely when its determinant is nonzero, which occurs when its columns are linearly independent, its null space contains only the zero vector, and the corresponding linear transformation is one-to-one and onto.
Determinants and Their Geometric Meaning
The determinant of a 2×2 matrix measures the area scaling factor of the corresponding linear transformation. For a 3×3 matrix, it measures volume scaling. The sign of the determinant indicates whether the transformation preserves or reverses orientation. A zero determinant means the transformation collapses space into a lower dimension.
Determinants are computed by cofactor expansion, but this method is impractical for large matrices. In practice, determinants are computed via LU decomposition, where det(A) = det(L)det(U) = product of the diagonal entries of U (since L has unit diagonal).
Systems of Linear Equations
Solving Ax = b is the central computational problem of linear algebra. Gaussian elimination reduces the augmented matrix [A|b] to row-echelon form, then back-substitution finds the solution. The number of solutions depends on whether b lies in the column space of A and whether A has nullspace vectors.
LU factorization decomposes A into a lower triangular matrix L and an upper triangular matrix U, making it efficient to solve multiple systems with the same A but different b vectors. This factorization is the workhorse of scientific computing, appearing in everything from circuit simulation to weather prediction.
Eigenvalues and Eigenvectors
An eigenvector of a matrix A is a nonzero vector v such that Av = λv, where λ is the corresponding eigenvalue. Geometrically, eigenvectors are directions that the transformation stretches or compresses without rotating.
The Power of Diagonalization
A matrix with n linearly independent eigenvectors can be diagonalized: A = PDP⁻¹, where D is diagonal with eigenvalues on the diagonal. Diagonalization decouples the action of A, expressing it as independent scaling along each eigenvector direction. This makes computing powers Aᵏ trivial — just raise each eigenvalue to the kth power.
Diagonalization solves systems of linear differential equations. The matrix exponential e^{At} reduces to Pe^{Dt}P⁻¹, converting the system into independent exponential functions. Stability of the system depends on the eigenvalues: negative real parts produce decay, positive real parts produce growth, and imaginary parts produce oscillations.
The Characteristic Polynomial
Eigenvalues satisfy det(A − λI) = 0, the characteristic equation. For an n×n matrix, this is an nth-degree polynomial. Finding eigenvalues thus requires solving polynomial equations, which for n > 4 generally requires numerical methods. The QR algorithm is the standard numerical method for computing all eigenvalues and eigenvectors of a general matrix.
Spectral Theorem and Symmetric Matrices
Symmetric matrices are diagonalizable by an orthogonal matrix: A = QΛQᵀ, where Q’s columns are orthonormal eigenvectors and Λ contains the real eigenvalues. The spectral theorem guarantees that eigenvalues of symmetric matrices are real and eigenvectors corresponding to distinct eigenvalues are orthogonal.
Positive definite matrices have all positive eigenvalues. They correspond to quadratic forms xᵀAx that are positive for all nonzero x. Positive definiteness is equivalent to the Cholesky decomposition existing and all leading principal minors being positive. These matrices appear in optimization (Hessian at a minimum), statistics (covariance matrices), and engineering (stiffness matrices).
Orthogonal Projections and Least Squares
Orthogonal projection onto a subspace finds the point in the subspace closest to a given vector. The projection matrix P = A(AᵀA)⁻¹Aᵀ projects onto the column space of A. This projection minimizes the distance between the original vector and its projection.
Least squares solves Ax ≈ b when the system is overdetermined (more equations than unknowns). The normal equations AᵀAx = Aᵀb give the least squares solution minimizing ||Ax − b||². This is the foundation of linear regression, where the best-fit line minimizes the sum of squared vertical distances between data points and the line.
Applications of Linear Algebra
Linear algebra permeates modern technology. Search engines use eigenvector centrality (PageRank) to rank web pages by importance. Machine learning algorithms represent data as matrices and use linear algebra operations for dimensionality reduction, feature extraction, and model training. Computer graphics transform 3D models by multiplying vertex coordinates by 4×4 transformation matrices.
Data compression via the singular value decomposition (SVD) factorizes any matrix into a product of three matrices that reveal the dominant patterns in the data. The SVD is the foundation of principal component analysis (PCA) for reducing high-dimensional datasets to their most informative dimensions, as used in data science mathematics.
Network analysis uses adjacency matrices to represent connections between nodes. The eigenvalues of the Laplacian matrix reveal community structure. Power iteration finds the dominant eigenvector for ranking. These methods scale to networks with millions of nodes.
The Singular Value Decomposition
The singular value decomposition (SVD) factorizes any m×n matrix A = UΣVᵀ, where U and V are orthogonal matrices and Σ is diagonal with nonnegative singular values σ₁ ≥ σ₂ ≥ … ≥ σ_r > 0. The rank r equals the number of nonzero singular values.
The SVD has numerous applications in data science. The truncated SVD keeps only the k largest singular values, producing the best rank-k approximation to A in the least squares sense (Eckart-Young theorem). This is the basis of principal component analysis, latent semantic analysis in natural language processing, and image compression. The pseudoinverse A⁺ = VΣ⁺Uᵀ solves least squares problems and provides the minimum-norm solution to underdetermined systems.
Control theory uses linear algebra to design feedback systems. The controllability matrix determines whether a system can be driven to any state. The observability matrix determines whether the internal state can be inferred from measurements. Both criteria reduce to rank conditions on matrices.
What is the difference between a vector and a scalar? A scalar is a single number. A vector is an ordered list of numbers with magnitude and direction. Scalars scale vectors through multiplication.
Why are eigenvalues important? Eigenvalues reveal the fundamental behavior of a linear transformation — whether it amplifies, dampens, or rotates along each direction. They determine stability of dynamical systems, vibration frequencies, and principal components of data.
What does it mean for a matrix to be invertible? An invertible matrix has a unique inverse. Invertibility means the corresponding linear transformation can be reversed, the determinant is nonzero, the columns span the full space, and Ax = b has a unique solution for any b.
How is linear algebra used in machine learning? Data is represented as matrices (samples × features). Linear transformations model neural network layers. The SVD performs dimensionality reduction. Gradient descent for linear regression has an exact closed-form solution using the normal equations.
Data Science Mathematics — Numerical Analysis — Computational Mathematics