A symmetric matrix is the finite-dimensional model of a self-adjoint operator, and it decomposes into orthogonal axes along which it acts by pure scaling. The scalings are the eigenvalues and the axes the eigenvectors, and the spectral theorem says they always exist and span the space. This is the fact behind principal component analysis, where the axes are the directions of greatest variance, and behind the Gaussian covariance, where the same decomposition factors the covariance matrix. This post proves it from the compactness of the sphere, the finite-dimensional shadow of the spectral theorem for compact operators [1], [2]. Vectors live in with inner product , and a matrix is symmetric when .
#Eigenvalues and symmetry
A scalar is an eigenvalue of with eigenvector when .
For a symmetric matrix the eigenvalues are constrained to the real line and the eigenvectors to orthogonal directions.
A symmetric matrix has real eigenvalues, and eigenvectors belonging to distinct eigenvalues are orthogonal.
For reality, allow complex vectors momentarily and let be the conjugate transpose. If with , then . The scalar equals its own conjugate. Reality of () gives , and since is a scalar it equals its own transpose, by symmetry , so it is real. As is real, is real, and a real eigenvalue yields a real eigenvector from the null space of the singular real matrix . For orthogonality, let and with . Then by symmetry, so forces .
#The spectral theorem
The eigenvector carrying the largest eigenvalue is the one that maximises the quadratic form, and compactness guarantees the maximum is attained.
A real symmetric matrix has an orthonormal basis of eigenvectors, with real eigenvalues . Equivalently with orthogonal and .
Argue by induction on , the case being trivial. The Rayleigh quotient is continuous on the unit sphere , which is closed and bounded, so by the extreme value theorem it attains a maximum at some unit vector . The constraint has , nonzero on the sphere since , so the constraint qualification holds and the Lagrange multiplier condition applies. With this reads , so and is a unit eigenvector. The orthogonal complement is invariant under , since for , , so . Fix an orthonormal basis of ; by invariance the restriction of is an operator on with matrix , and symmetry gives , so is a genuine symmetric matrix. By induction it has an orthonormal eigenbasis, which pulls back to an orthonormal eigenbasis of , and together with these form an orthonormal eigenbasis of . Placing the as the columns of makes orthogonal, and reads off the eigen-equations, giving .
The decomposition writes as a weighted sum of the rank-one projections onto its eigenaxes, and in the eigenbasis is the diagonal matrix , scaling the -th coordinate by . Every symmetric matrix is therefore a stretch along orthogonal axes.
#The variational characterisation
The eigenvalues are not only the diagonal entries but the extreme values of the quadratic form, which is how they drive principal component analysis.
Order the eigenvalues . Then and , each attained at the corresponding eigenvector. More generally over -dimensional subspaces .
Expand a unit vector in the eigenbasis, with . Then , a weighted average of the eigenvalues with weights summing to , which lies between and and reaches at and at . For the intermediate eigenvalue, given any -dimensional , the span of has dimension , so it meets in a nonzero vector , on which , so the inner minimum over is at most . Taking , every unit has , so the maximum over of the inner minimum is exactly .
The variational characterisation is what makes the spectral theorem a tool for data. The direction of greatest spread of a centred data cloud is the top eigenvector of its covariance matrix, the first principal component, because the variance of the projection onto a unit direction is and the spectral theorem maximises it at with value . The successive principal components are the remaining eigenvectors, capturing the most variance orthogonal to those already taken, which is the finite-dimensional Karhunen-Loeve expansion and the reason a covariance is summarised by its leading eigenaxes. The spectral theorem turns a symmetric matrix into a list of axes and scales that resolves any quadratic optimisation over the matrix.