Skip to main content
Logo image

Section 22.6 Theory

Subsection 22.6.1 Similar matrices

First, we’ll record just a few of the facts about general similar matrices from Section 22.3.

Proof of Statement 1.

Suppose square matrices \(A\) and \(B\) are similar, and \(P\) is a transition matrix that realizes the similarity, so that \(B = \inv{P}AP\text{.}\)
We know from Proposition 10.5.6 that the determinant of a product is the product of the determinants. And we also know from Proposition 10.5.8 that the determinant of an inverse is the inverse of the determinant. So we can compute \(\det B\) as
\begin{align*} \det B \amp= \det (\inv{P}AP) \\ \amp= \left(\det \inv{P}\right) (\det A) (\det P) \\ \amp= \inv{(\det P)} (\det A) (\det P) \\ \amp= \frac{(\det A) \bcancel{(\det P)}}{\bcancel{\det P}} \\ \amp= \det A. \end{align*}
Thus, the similar matrices \(A\) and \(B\) have the same determinant.
Warning 22.6.2. Careful.
In this proof, it would have been incorrect to cancel the \(\inv{P}\) with the \(P\) immediately, because order of matrix multiplication matters! It was only after we split the determinant into a product of determinants that we could cancel \(\det\left(\inv{P}\right)\) with \(\det P\) because all three of the determinants are numbers, and order of number multiplication does not matter.

Proof of Statement 2.

Suppose square matrices \(A\) and \(B\) are similar, and \(P\) is a transition matrix that realizes the similarity, so that \(B = \inv{P}AP\text{.}\)
The characteristic polynomials of these two matrices are computed as
\begin{align*} c_A(\lambda) \amp= \det(\lambda I - A), \amp c_B(\lambda) \amp= \det(\lambda I - B). \end{align*}
Using our assumption \(B = \inv{P}AP\text{,}\) along with \(I = \inv{P}IP\text{,}\) we can express the matrix involved in the characteristic polynomial for \(B\) as
\begin{equation*} \lambda I - B = \lambda \inv{P}IP - \inv{P}AP = \inv{P}(\lambda I - A)P, \end{equation*}
where in the last step we have factored the common \(\inv{P}\) and \(P\) factors out of the difference (making sure to factor each to the correct side, because order of matrix multiplication matters). We have now shown that matrices \(\lambda I - A\) and \(\lambda I - B\) are also similar, via the same transition matrix \(P\text{,}\) and so by Statement 1 they have the same determinant. That is,
\begin{equation*} c_B(\lambda) = \det(\lambda I - B) = \det(\lambda I - A) = c_A(\lambda), \end{equation*}
and thus the similar matrices \(A\) and \(B\) have the same characteristic polynomial.

Proof of Statement 3.

Statement 3 follows immediately from Statement 2, as the eigenvalues of a matrix are precisely the roots of the characteristic polynomial of the matrix, and the algebraic multiplicity of an eigenvalue is the number of times that value is repeated as a root of the characteristic polynomial.

Subsection 22.6.2 Diagonalizable matrices

We start with the justification that a transition matrix made up of linearly independent eigenvectors will diagonalize a matrix.

Proof.

This fact follows from our analysis of the transition matrix \(P\) and the diagonal form matrix \(\inv{P}AP\) in Subsection 22.4.1.
We will refine this theorem using our more sophisticated notions of algebraic and geometric multiplicity in the next subsection. But first, here is a surprising result that demonstrates how central eigenvalues are in matrix theory.

Proof.

Suppose \(A\) is a diagonalizable matrix. Then it is similar to some diagonal matrix \(D\text{.}\) The eigenvalues of a diagonal matrix are precisely the diagonal entries, and the algebraic multiplicity of each of these eigenvalues is the number of times that eigenvalue is repeated down the diagonal. So if \(\lambda_1,\lambda_2,\dotsc,\lambda_\ell\) are all of the distinct eigenvalues of \(D\) (i.e. there are no repeats in this list of eigenvalues), and \(m_1,m_2,\dotsc,m_\ell\) are the corresponding algebraic multiplicities of these eigenvalues (i.e. each \(m_j\) is equal to the number of times \(\lambda_j\) appears on the main diagonal of \(D\)), then
\begin{equation*} \det D = \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell}, \end{equation*}
because the determinant of a diagonal matrix is just the product of its diagonal entries (Statement 1 of Proposition 8.5.2). But from Statement 1 of Proposition 22.6.1 we know that the similar matrices \(A\) and \(D\) have the same determinant, and have all the same eigenvalues with the same corresponding algebraic multiplicities. Thus, the expression
\begin{equation*} \det A = \det D = \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell} \end{equation*}
can be viewed as an expression for \(\det A\) as a product of the eigenvalues of \(A\text{,}\) including multiplicities.

Remark 22.6.5.

The above fact is actually true about all square matrices, if you allow complex eigenvalues. In a second linear algebra course, you may learn that diagonalizable matrices are a special case of a more general theory, in which every matrix can be triangularized. That is, every square matrix is similar to a special form of triangular matrix (either upper or lower), though for many matrices both the transition matrix and the triangular form matrix might need to contain complex numbers in its entries. In this more general theory, it is again the case that the diagonal entries of the triangular form matrix will be precisely the eigenvalues of the original matrix, with each eigenvalue repeated down the diagonal according to its algebraic multiplicity, so the proof provided for the fact above can be adapted to work in this slightly more general setting.

Subsection 22.6.3 The geometry of eigenvectors

We require that the columns of a transition matrix \(P\) be linearly independent, so that \(P\) is invertible. Basis vectors for a particular eigenspace are linearly independent by definition of basis. But when we lump basis vectors from different eigenspaces together, will they all remain linearly independent together? The next fact answers this question with a more general version of what we explored in Discovery 22.6.

Proof.

Let’s write \(S = \{\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell,\uvec{w}_1,\uvec{w}_2,\dotsc,\uvec{w}_m\}\text{,}\) where the \(\uvec{v}_j\) are those eigenvectors in \(S\) that are in the same eigenspace as \(\uvec{v}\text{,}\) and the \(\uvec{w}_j\) are those that are not. Write \(\lambda\) for the eigenvalue of \(A\) corresponding to \(\uvec{v}\) (hence also to each \(\uvec{v}_j\)), and write \(\lambda_j\) for the eigenvalue corresponding to \(\uvec{w}_j\text{.}\) We have assumed that the full set \(S\) is linearly independent, and therefore so are the subsets \(\{\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) and \(\{\uvec{w}_1,\uvec{w}_2,\dotsc,\uvec{w}_m\}\) (Statement 2 of Statement 17.5.3). In addition, we have assumed that the set \(\{\uvec{v},\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) remains linearly independent.
The strategy in this proof is essentially the same as explored in Discovery 22.6. To prove independence, we must prove that the assumption
\begin{gather} \begin{split} k \uvec{v} + \amp a_1 \uvec{v}_1 + a_2 \uvec{v}_2 + \dotsb + a_\ell \uvec{v}_\ell \\ \amp + b_1 \uvec{w}_1 + b_2 \uvec{w}_2 + \dotsb + b_m \uvec{w}_m = \zerovec \end{split}\tag{✶} \end{gather}
leads to the conclusion that each of the scalars \(k,a_1,a_2,\dotsc,a_\ell,b_1,b_2,\dotsc,b_m\) is \(0\text{.}\)
Since each of the vectors in the combination above is an eigenvector for \(A\text{,}\) if we multiply both sides of equation (✶) by the matrix \(A\text{,}\) we may substitute \(A\uvec{v} = \lambda \uvec{v}\text{,}\) \(A\uvec{v}_j = \lambda \uvec{v}_j\text{,}\) and \(A\uvec{w}_j = \lambda_j \uvec{w}_j\text{.}\) Making these substitutions, we obtain
\begin{equation*} \begin{split} k \lambda \uvec{v} + \amp a_1 \lambda \uvec{v}_1 + a_2 \lambda \uvec{v}_2 + \dotsb + a_\ell \lambda \uvec{v}_\ell \\ \amp + b_1 \lambda_1 \uvec{w}_1 + b_2 \lambda_2 \uvec{w}_2 + \dotsb + b_m \lambda_m \uvec{w}_m = \zerovec. \end{split} \end{equation*}
Compare this “\(A\) times (✶)” equation with the result of multiplying (✶) through by the scalar \(\lambda\text{:}\)
\begin{equation*} \begin{split} k \lambda \uvec{v} + \amp a_1 \lambda \uvec{v}_1 + a_2 \lambda \uvec{v}_2 + \dotsb + a_\ell \lambda \uvec{v}_\ell \\ \amp + b_1 \lambda \uvec{w}_1 + b_2 \lambda \uvec{w}_2 + \dotsb + b_m \lambda \uvec{w}_m = \zerovec. \end{split} \end{equation*}
Notice that the \(\uvec{v}\) and \(\uvec{v}_j\) terms of both the “\(A\) times (✶)” equation and the “\(\lambda\) times (✶)” equation are identical, so if we subtract these equations and collect like \(\uvec{w}_j\)-terms, we obtain
\begin{equation*} b_1 (\lambda_1-\lambda) \uvec{w}_1 + b_2 (\lambda_2-\lambda) \uvec{w}_2 + \dotsb + b_m (\lambda_m-\lambda) \uvec{w}_m = \zerovec. \end{equation*}
Since the collection of \(\uvec{w}_j\) vectors are linearly independent, the scalar coefficient expressions in this new linear combination must all be zero. That is, each scalar expression
\begin{equation*} b_j (\lambda_j-\lambda) \end{equation*}
must be zero. However, none of the \(\uvec{w}_j\) is from the same eigenspace as \(\uvec{v}\text{,}\) so each \(\lambda_j-\lambda\) is nonzero, which forces each of the \(b_j\) to be zero.
Substituting this new information into equation (✶), we have
\begin{equation*} k \uvec{v} + a_1 \uvec{v}_1 + a_2 \uvec{v}_2 + \dotsb + a_\ell \uvec{v}_\ell = \zerovec. \end{equation*}
But the collection \(\{\uvec{v},\uvec{v}_1,\uvec{v}_2,\dotsc,\uvec{v}_\ell\}\) is assumed independent, so each of the scalars in the remaining combination on the left above is also zero.
We have now successfully shown that the only way equation (✶) can be true is if each of the scalars involved is \(0\text{,}\) as required.
The proposition above is somewhat similar in effect to Proposition 17.5.6, in that it lets us build up a linearly independent set of eigenvectors one-by-one. But the above fact is a little stronger, in that when we look to add a new eigenvector to our collection, we only need to worry about it being linearly independent from the eigenvectors we already have from that eigenspace. This leads to the following corollary.

Proof.

Let \(A\) be a square matrix, and write \(\lambda_1,\lambda_2,\dotsc,\lambda_\ell\) for its eigenvalues. Suppose we have a basis \(\basisfont{B}_1\) for eigenspace \(E_{\lambda_1}(A)\text{,}\) and a basis \(\basisfont{B}_2\) for eigenspace \(E_{\lambda_2}(A)\text{,}\) and so on. Begin with \(\basisfont{B}_1\text{,}\) which is linearly independent because it is a basis for a subspace. Enlarge \(\basisfont{B}_1\) with vectors from \(\basisfont{B}_2\text{,}\) one at a time. At each step we may apply Proposition 22.6.6, because each new vector from \(\basisfont{B}_2\) is both
  • from a different eigenspace than the vectors in \(\basisfont{B}_1\text{,}\) and
  • linearly independent from the previous vectors from \(\basisfont{B}_2\) already included in the new enlarged collection.
Proposition 22.6.6 tells us that at each step of enlarging our collection by one, the new, larger collection will remain linearly independent. Once we run out of vectors in \(\basisfont{B}_2\text{,}\) we begin enlarging our collection with vectors from \(\basisfont{B}_3\text{,}\) one at a time. Again, Proposition 22.6.6 applies at each enlargement step, so that each collection of eigenvectors along the way remains linearly independent. Carry this process through to the end, until finally all vectors from \(\basisfont{B}_\ell\) are also included, and Proposition 22.6.6 will still apply at the last step to tell us that the complete set of basis eigenvectors is linearly independent.
In the next subsection, we will use this corollary to refine our initial characterization of diagonalizability stated in Theorem 22.6.3. In the meantime, we will formally state the relationship between geometric and algebraic multiplicities that we discussed in Subsection 22.4.2.

Proof.

We will not include the proof of this statement here — you may encounter it in further study of matrix forms, perhaps in a second course in linear algebra.

Remark 22.6.9.

As we’ve noted already, the geometric multiplicity of an eigenvalue is always at least one, since otherwise it wouldn’t have any corresponding nonzero eigenvectors!

Subsection 22.6.4 More about diagonalizable matrices

Corollary 22.6.7 tells us that when collecting eigenvectors to make up the transition matrix \(P\text{,}\) we only have to worry about linear independence inside eigenspaces; linear independence between eigenspaces is automatic. But linear independence inside an eigenspace \(E_{\lambda_j}(A)\) is taken care of for us when we row reduce \(\lambda_j I - A\text{.}\) So our initial characterization of diagonalization in Theorem 22.6.3 can be refined so that we don’t actually have to worry about linear independence of eigenvectors at all — we just have to worry about having enough eigenspace basis vectors. It turns out that the algebraic multiplicity of each eigenvector is exactly the necessary number of basis vectors for the corresponding eigenspace, and the next statements record this thinking.

Proof of Statement 1.

We need \(n\) linearly independent eigenvectors to make up the columns of the \(n\times n\) transition matrix \(P\text{.}\) The maximum number of linearly independent eigenvectors we can get from a single eigenspace is \(\dim(E_\lambda(A))\text{,}\) the geometric multiplicity of the eigenvalue \(\lambda\text{.}\) So the maximum number of linearly independent eigenvectors we can get in total is the sum of the geometric multiplicities of the eigenvalues. But the characteristic polynomial \(c_A(\lambda)\) has degree \(n\text{,}\) and \(n\) is the sum of the algebraic multiplicities of the eigenvalues, because if \(A\) has all real eigenvalues, then \(c_A(\lambda)\) factors as
\begin{equation*} c_A(\lambda) = (\lambda-\lambda_1)^{m_1}(\lambda-\lambda_2)^{m_2}\dotsm(\lambda-\lambda_\ell)^{m_\ell}. \end{equation*}
So if even one eigenvalue is deficient in the sense that its geometric multiplicity is strictly less than its algebraic multiplicity, we won’t obtain enough linearly independent eigenvectors from that eigenspace to contribute to the \(n\) linearly eigenvectors we need in total.
On the other hand, if each eigenvalue has geometric multiplicity equal to its algebraic multiplicity, then forming eigenspace bases and collecting them all together will provide us with exactly \(n\) eigenvectors, and Proposition 22.6.6 tells us that these \(n\) eigenvectors will be linearly independent.

Proof of Statement 2.

In the case that a square matrix has \(n\) different real eigenvalues, then each of these eigenvalues must have algebraic multiplicity \(1\text{,}\) since otherwise these \(n\) algebraic multiplicities would add up to more than \(n\text{,}\) the degree of the characteristic polynomial. So each geometric multiplicity is no greater than \(1\text{.}\) But also, as in noted in Remark 22.6.9, each geometric multiplicity must be at least \(1\text{.}\) Thus, each geometric multiplicity for this matrix is exactly \(1\text{,}\) and so is equal to the corresponding algebraic multiplicity.
The result now follows from the first statement of this corollary.