Section 22.6 Theory
Subsection 22.6.1 Similar matrices
First, we’ll record just a few of the facts about general similar matrices from Section 22.3.
Proposition 22.6.1. Properties of similar matrices.
- Similar matrices have the same determinant.
- Similar matrices have the same characteristic polynomial.
- Similar matrices have the same eigenvalues, with the same algebraic multiplicities.
Proof of Statement 1.
Suppose square matrices \(A\) and \(B\) are similar, and \(P\) is a transition matrix that realizes the similarity, so that \(B = \inv{P}AP\text{.}\)
We know from Proposition 10.5.6 that the determinant of a product is the product of the determinants. And we also know from Proposition 10.5.8 that the determinant of an inverse is the inverse of the determinant. So we can compute \(\det B\) as
\begin{align*}
\det B \amp= \det (\inv{P}AP) \\
\amp= \left(\det \inv{P}\right) (\det A) (\det P) \\
\amp= \inv{(\det P)} (\det A) (\det P) \\
\amp= \frac{(\det A) \bcancel{(\det P)}}{\bcancel{\det P}} \\
\amp= \det A.
\end{align*}
Thus, the similar matrices \(A\) and \(B\) have the same determinant.
Warning 22.6.2. Careful.
In this proof, it would have been incorrect to cancel the \(\inv{P}\) with the \(P\) immediately, because order of matrix multiplication matters! It was only after we split the determinant into a product of determinants that we could cancel \(\det\left(\inv{P}\right)\) with \(\det P\) because all three of the determinants are numbers, and order of number multiplication does not matter.
Proof of Statement 2.
Suppose square matrices \(A\) and \(B\) are similar, and \(P\) is a transition matrix that realizes the similarity, so that \(B = \inv{P}AP\text{.}\)
The characteristic polynomials of these two matrices are computed as
\begin{align*}
c_A(\lambda) \amp= \det(\lambda I - A), \amp
c_B(\lambda) \amp= \det(\lambda I - B).
\end{align*}
Using our assumption \(B = \inv{P}AP\text{,}\) along with \(I = \inv{P}IP\text{,}\) we can express the matrix involved in the characteristic polynomial for \(B\) as
\begin{equation*}
\lambda I - B = \lambda \inv{P}IP - \inv{P}AP = \inv{P}(\lambda I - A)P,
\end{equation*}
where in the last step we have factored the common \(\inv{P}\) and \(P\) factors out of the difference (making sure to factor each to the correct side, because order of matrix multiplication matters). We have now shown that matrices \(\lambda I - A\) and \(\lambda I - B\) are also similar, via the same transition matrix \(P\text{,}\) and so by Statement 1 they have the same determinant. That is,
\begin{equation*}
c_B(\lambda) = \det(\lambda I - B) = \det(\lambda I - A) = c_A(\lambda),
\end{equation*}
and thus the similar matrices \(A\) and \(B\) have the same characteristic polynomial.
Proof of Statement 3.
Statement 3 follows immediately from Statement 2, as the eigenvalues of a matrix are precisely the roots of the characteristic polynomial of the matrix, and the algebraic multiplicity of an eigenvalue is the number of times that value is repeated as a root of the characteristic polynomial.
Subsection 22.6.2 Diagonalizable matrices
We start with the justification that a transition matrix made up of linearly independent eigenvectors will diagonalize a matrix.
Theorem 22.6.3. Characterization of diagonalizability.
An \(n \times n\) matrix \(A\) is diagonalizable if and only if there exists a set of \(n\) linearly independent vectors in \(\R^n\text{,}\) each of which is an eigenvector of \(A\text{.}\) If \(P\) is an \(n \times n\) matrix whose columns are linearly independent eigenvectors of \(A\text{,}\) then \(P\) diagonalizes \(A\text{.}\)
Proof.
This fact follows from our analysis of the transition matrix \(P\) and the diagonal form matrix \(\inv{P}AP\) in Subsection 22.4.1.
We will refine this theorem using our more sophisticated notions of algebraic and geometric multiplicity in the next subsection. But first, here is a surprising result that demonstrates how central eigenvalues are in matrix theory.
Proposition 22.6.4. Determinant versus eigenvalues.
If a square matrix is diagonlizable, then its determinant is equal to the product of its eigenvalues (including multiplicities).
Proof.
Suppose \(A\) is a diagonalizable matrix. Then it is similar to some diagonal matrix \(D\text{.}\) The eigenvalues of a diagonal matrix are precisely the diagonal entries, and the algebraic multiplicity of each of these eigenvalues is the number of times that eigenvalue is repeated down the diagonal. So if \(\lambda_1,\lambda_2,\dotsc,\lambda_\ell\) are all of the distinct eigenvalues of \(D\) (i.e. there are no repeats in this list of eigenvalues), and \(m_1,m_2,\dotsc,m_\ell\) are the corresponding algebraic multiplicities of these eigenvalues (i.e. each \(m_j\) is equal to the number of times \(\lambda_j\) appears on the main diagonal of \(D\)), then
\begin{equation*}
\det D = \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell},
\end{equation*}
because the determinant of a diagonal matrix is just the product of its diagonal entries (Statement 1 of Proposition 8.5.2). But from Statement 1 of Proposition 22.6.1 we know that the similar matrices \(A\) and \(D\) have the same determinant, and have all the same eigenvalues with the same corresponding algebraic multiplicities. Thus, the expression
\begin{equation*}
\det A = \det D = \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell}
\end{equation*}
can be viewed as an expression for \(\det A\) as a product of the eigenvalues of \(A\text{,}\) including multiplicities.
Remark 22.6.5.
The above fact is actually true about all square matrices, if you allow complex eigenvalues. In a second linear algebra course, you may learn that diagonalizable matrices are a special case of a more general theory, in which every matrix can be triangularized. That is, every square matrix is similar to a special form of triangular matrix (either upper or lower), though for many matrices both the transition matrix and the triangular form matrix might need to contain complex numbers in its entries. In this more general theory, it is again the case that the diagonal entries of the triangular form matrix will be precisely the eigenvalues of the original matrix, with each eigenvalue repeated down the diagonal according to its algebraic multiplicity, so the proof provided for the fact above can be adapted to work in this slightly more general setting.
Subsection 22.6.3 The geometry of eigenvectors
We require that the columns of a transition matrix \(P\) be linearly independent, so that \(P\) is invertible. Basis vectors for a particular eigenspace are linearly independent by definition of basis. But when we lump basis vectors from different eigenspaces together, will they all remain linearly independent together? The next fact answers this question with a more general version of what we explored in Discovery 22.6.
Proposition 22.6.6. Eigenvectors from different eigenvalues are independent.
Suppose \(A\) is an \(n \times n\) matrix, and \(S\) is a linearly independent set of vectors in \(\R^n\text{,}\) each of which is an eigenvector for \(A\text{.}\) Further suppose that \(\uvec{v}\) is another eigenvector for \(A\) that is linearly independent from those vectors in \(S\) that are from the same eigenspace as \(\uvec{v}\text{.}\) Then the enlarged collection \(S'\) of eigenvectors consisting of all vectors in \(S\) along with \(\uvec{v}\) is also linearly independent.
Proof.
Let’s write \(S = \{\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell,\uvec{w}_1,\uvec{w}_2,\dotsc,\uvec{w}_m\}\text{,}\) where the \(\uvec{v}_j\) are those eigenvectors in \(S\) that are in the same eigenspace as \(\uvec{v}\text{,}\) and the \(\uvec{w}_j\) are those that are not. Write \(\lambda\) for the eigenvalue of \(A\) corresponding to \(\uvec{v}\) (hence also to each \(\uvec{v}_j\)), and write \(\lambda_j\) for the eigenvalue corresponding to \(\uvec{w}_j\text{.}\) We have assumed that the full set \(S\) is linearly independent, and therefore so are the subsets \(\{\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) and \(\{\uvec{w}_1,\uvec{w}_2,\dotsc,\uvec{w}_m\}\) (Statement 2 of Statement 17.5.3). In addition, we have assumed that the set \(\{\uvec{v},\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) remains linearly independent.
The strategy in this proof is essentially the same as explored in Discovery 22.6. To prove independence, we must prove that the assumption
\begin{gather}
\begin{split}
k \uvec{v} +
\amp a_1 \uvec{v}_1 + a_2 \uvec{v}_2 + \dotsb + a_\ell \uvec{v}_\ell \\
\amp + b_1 \uvec{w}_1 + b_2 \uvec{w}_2 + \dotsb + b_m \uvec{w}_m
= \zerovec
\end{split}\tag{✶}
\end{gather}
leads to the conclusion that each of the scalars \(k,a_1,a_2,\dotsc,a_\ell,b_1,b_2,\dotsc,b_m\) is \(0\text{.}\)
Since each of the vectors in the combination above is an eigenvector for \(A\text{,}\) if we multiply both sides of equation (✶) by the matrix \(A\text{,}\) we may substitute \(A\uvec{v} = \lambda \uvec{v}\text{,}\) \(A\uvec{v}_j = \lambda \uvec{v}_j\text{,}\) and \(A\uvec{w}_j = \lambda_j \uvec{w}_j\text{.}\) Making these substitutions, we obtain
\begin{equation*}
\begin{split}
k \lambda \uvec{v} +
\amp a_1 \lambda \uvec{v}_1 + a_2 \lambda \uvec{v}_2 + \dotsb + a_\ell \lambda \uvec{v}_\ell \\
\amp + b_1 \lambda_1 \uvec{w}_1 + b_2 \lambda_2 \uvec{w}_2 + \dotsb + b_m \lambda_m \uvec{w}_m
= \zerovec.
\end{split}
\end{equation*}
Compare this “\(A\) times (✶)” equation with the result of multiplying (✶) through by the scalar \(\lambda\text{:}\)
\begin{equation*}
\begin{split}
k \lambda \uvec{v} +
\amp a_1 \lambda \uvec{v}_1 + a_2 \lambda \uvec{v}_2 + \dotsb + a_\ell \lambda \uvec{v}_\ell \\
\amp + b_1 \lambda \uvec{w}_1 + b_2 \lambda \uvec{w}_2 + \dotsb + b_m \lambda \uvec{w}_m
= \zerovec.
\end{split}
\end{equation*}
Notice that the \(\uvec{v}\) and \(\uvec{v}_j\) terms of both the “\(A\) times (✶)” equation and the “\(\lambda\) times (✶)” equation are identical, so if we subtract these equations and collect like \(\uvec{w}_j\)-terms, we obtain
\begin{equation*}
b_1 (\lambda_1-\lambda) \uvec{w}_1 + b_2 (\lambda_2-\lambda) \uvec{w}_2
+ \dotsb + b_m (\lambda_m-\lambda) \uvec{w}_m
= \zerovec.
\end{equation*}
Since the collection of \(\uvec{w}_j\) vectors are linearly independent, the scalar coefficient expressions in this new linear combination must all be zero. That is, each scalar expression
\begin{equation*}
b_j (\lambda_j-\lambda)
\end{equation*}
must be zero. However, none of the \(\uvec{w}_j\) is from the same eigenspace as \(\uvec{v}\text{,}\) so each \(\lambda_j-\lambda\) is nonzero, which forces each of the \(b_j\) to be zero.
Substituting this new information into equation (✶), we have
\begin{equation*}
k \uvec{v} + a_1 \uvec{v}_1 + a_2 \uvec{v}_2 + \dotsb + a_\ell \uvec{v}_\ell = \zerovec.
\end{equation*}
But the collection \(\{\uvec{v},\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) is assumed independent, so each of the scalars in the remaining combination on the left above is also zero.
We have now successfully shown that the only way equation (✶) can be true is if each of the scalars involved is \(0\text{,}\) as required.
The proposition above is somewhat similar in effect to Proposition 17.5.6, in that it lets us build up a linearly independent set of eigenvectors one-by-one. But the above fact is a little stronger, in that when we look to add a new eigenvector to our collection, we only need to worry about it being linearly independent from the eigenvectors we already have from that eigenspace. This leads to the following corollary.
Corollary 22.6.7. Eigenspaces are independent.
Given a collection of bases for the different eigenspaces of a matrix, the collection of all these eigenspace basis vectors together will still be linearly independent.
Proof.
Let \(A\) be a square matrix, and write \(\lambda_1,\lambda_2,\dotsc,\lambda_\ell\) for its eigenvalues. Suppose we have a basis \(\basisfont{B}_1\) for eigenspace \(E_{\lambda_1}(A)\text{,}\) and a basis \(\basisfont{B}_2\) for eigenspace \(E_{\lambda_2}(A)\text{,}\) and so on. Begin with \(\basisfont{B}_1\text{,}\) which is linearly independent because it is a basis for a subspace. Enlarge \(\basisfont{B}_1\) with vectors from \(\basisfont{B}_2\text{,}\) one at a time. At each step we may apply Proposition 22.6.6, because each new vector from \(\basisfont{B}_2\) is both
- from a different eigenspace than the vectors in \(\basisfont{B}_1\text{,}\) and
- linearly independent from the previous vectors from \(\basisfont{B}_2\) already included in the new enlarged collection.
Proposition 22.6.6 tells us that at each step of enlarging our collection by one, the new, larger collection will remain linearly independent. Once we run out of vectors in \(\basisfont{B}_2\text{,}\) we begin enlarging our collection with vectors from \(\basisfont{B}_3\text{,}\) one at a time. Again, Proposition 22.6.6 applies at each enlargement step, so that each collection of eigenvectors along the way remains linearly independent. Carry this process through to the end, until finally all vectors from \(\basisfont{B}_\ell\) are also included, and Proposition 22.6.6 will still apply at the last step to tell us that the complete set of basis eigenvectors is linearly independent.
In the next subsection, we will use this corollary to refine our initial characterization of diagonalizability stated in Theorem 22.6.3. In the meantime, we will formally state the relationship between geometric and algebraic multiplicities that we discussed in Subsection 22.4.2.
Theorem 22.6.8. Geometric versus algebraic multiplicity.
The geometric multiplicity of an eigenvalue is always less than or equal to its algebraic multiplicity.
Proof.
We will not include the proof of this statement here — you may encounter it in further study of matrix forms, perhaps in a second course in linear algebra.
Remark 22.6.9.
As we’ve noted already, the geometric multiplicity of an eigenvalue is always at least one, since otherwise it wouldn’t have any corresponding nonzero eigenvectors!
Subsection 22.6.4 More about diagonalizable matrices
Corollary 22.6.7 tells us that when collecting eigenvectors to make up the transition matrix \(P\text{,}\) we only have to worry about linear independence inside eigenspaces; linear independence between eigenspaces is automatic. But linear independence inside an eigenspace \(E_{\lambda_j}(A)\) is taken care of for us when we row reduce \(\lambda_j I - A\text{.}\) So our initial characterization of diagonalization in Theorem 22.6.3 can be refined so that we don’t actually have to worry about linear independence of eigenvectors at all — we just have to worry about having enough eigenspace basis vectors. It turns out that the algebraic multiplicity of each eigenvector is exactly the necessary number of basis vectors for the corresponding eigenspace, and the next statements record this thinking.
Corollary 22.6.10. More characterizations of diagonalizability.
- A matrix with real eigenvalues is diagonalizable if and only if each eigenvalue has geometric multiplicity equal to its algebraic multiplicity.
- An \(n \times n\) matrix that has \(n\) different real eigenvalues must be diagonalizable.
Proof of Statement 1.
We need \(n\) linearly independent eigenvectors to make up the columns of the \(n\times n\) transition matrix \(P\text{.}\) The maximum number of linearly independent eigenvectors we can get from a single eigenspace is \(\dim(E_\lambda(A))\text{,}\) the geometric multiplicity of the eigenvalue \(\lambda\text{.}\) So the maximum number of linearly independent eigenvectors we can get in total is the sum of the geometric multiplicities of the eigenvalues. But the characteristic polynomial \(c_A(\lambda)\) has degree \(n\text{,}\) and \(n\) is the sum of the algebraic multiplicities of the eigenvalues, because if \(A\) has all real eigenvalues, then \(c_A(\lambda)\) factors as
\begin{equation*}
c_A(\lambda) = (\lambda-\lambda_1)^{m_1}(\lambda-\lambda_2)^{m_2}\dotsm(\lambda-\lambda_\ell)^{m_\ell}.
\end{equation*}
So if even one eigenvalue is deficient in the sense that its geometric multiplicity is strictly less than its algebraic multiplicity, we won’t obtain enough linearly independent eigenvectors from that eigenspace to contribute to the \(n\) linearly eigenvectors we need in total.
On the other hand, if each eigenvalue has geometric multiplicity equal to its algebraic multiplicity, then forming eigenspace bases and collecting them all together will provide us with exactly \(n\) eigenvectors, and Proposition 22.6.6 tells us that these \(n\) eigenvectors will be linearly independent.
Proof of Statement 2.
In the case that a square matrix has \(n\) different real eigenvalues, then each of these eigenvalues must have algebraic multiplicity \(1\text{,}\) since otherwise these \(n\) algebraic multiplicities would add up to more than \(n\text{,}\) the degree of the characteristic polynomial. So each geometric multiplicity is no greater than \(1\text{.}\) But also, as in noted in Remark 22.6.9, each geometric multiplicity must be at least \(1\text{.}\) Thus, each geometric multiplicity for this matrix is exactly \(1\text{,}\) and so is equal to the corresponding algebraic multiplicity.
The result now follows from the first statement of this corollary.