Skip to main content

Section 30.5 Theory

Subsection 30.5.1 Similarity to triangular block form

Every similarity class contains a triangular block matrix, at least if we are will to use complex scalars, vectors, and matrices. It is possible to prove this without the aid of generalized eigenvectors first, and then from that fact deduce the required properties of generalized eigenspaces to make Procedure 30.4.2 work.

The proof is by induction on \(n\text{,}\) the size of \(A\text{.}\) We will assume we are working with a real matrix and real scalars. But there is no detail of the proof that depends on this assumption, and so the proof is valid for complex matrices and scalars as well.

Base case.

Every \(1 \times 1\) matrix is trivially in triangular-block form.

Induction step.

First, we make the induction hypothesis: let \(k\) represent a fixed positive integer, and assume that every \(k \times k\) matrix whose characteristic polynomial factors completely as in the statement of the theorem is similar to a matrix in triangular-block form.

We must now use the induction hypothesis to prove that every matrix \(A\) of size \((k+1) \times (k+1)\) whose characteristic polynomial factors completely as in the statement of the theorem is similar to a matrix in triangular-block form.

Suppose \(\uvec{x}\) is an eigenvector of \(A\) corresponding to \(\lambda_1\text{.}\) Choose a basis of \(\R^{k+1}\) that contains \(\uvec{x}\) as its first element, and let \(Q\) be the invertible \((k+1) \times (k+1)\) matrix made up of these basis vectors as columns. Then \(\inv{Q} A Q\) has block form

\begin{equation*} \inv{Q} A Q = \left[\begin{array}{@{}c|c@{}} \lambda_1 \amp B \\ \hline \zerovec \amp A' \end{array}\right] \text{,} \end{equation*}

where \(A'\) is \(k \times k\text{,}\) and \(B\) is \(1 \times k\text{.}\)

Similar to Statement 6 of Proposition 28.6.1, the characteristic polynomial of \(\inv{Q} A Q\) must be

\begin{equation*} c_{\inv{Q} A Q}(\lambda) = (\lambda - \lambda_1) \cdot c_{A'}(\lambda) \text{.} \end{equation*}

But similar matrices have the same characteristic polynomial (Theorem 26.5.8), and so from

\begin{equation*} c_A(\lambda) = (\lambda - \lambda_1) \cdot c_{A'}(\lambda) \end{equation*}

we may conclude that

\begin{equation*} c_{A'}(\lambda) = (\lambda-\lambda_1)^{m_1-1} (\lambda-\lambda_2)^{m_2} \dotsm (\lambda-\lambda_\ell)^{m_\ell} \text{.} \end{equation*}

As the characteristic polynomial of \(c_{A'}\) also factors completely, we may apply the induction hypothesis to obtain an invertible, \(k \times k\) matrix \(Q'\) such that \(\inv{(Q')} A' Q'\) is in triangular-block form

\begin{equation*} \inv{(Q')}A' Q' = \begin{bmatrix} U'_1 \\ \amp U'_2 \\ \amp \amp \ddots \\ \amp \amp \amp U_\ell' \end{bmatrix} \text{,} \end{equation*}

where each block \(U'_j\) has size \(m_j \times m_j\text{,}\) except block \(U'_1\text{,}\) which has size \((m_1-1)\times(m_1-1)\text{.}\) Thus, if we set \(P\) to be the invertible \((k+1) \times (k+1)\) matrix

\begin{equation*} P = Q \left[ \begin{array}{c|c} 1 \amp \zerovec \\ \hline \zerovec \amp Q' \end{array} \right] \text{,} \end{equation*}

we get

\begin{align*} \inv{P} A P \amp = \left[\begin{array}{@{}c|c@{}} \lambda_1 \amp BQ' \\ \hline \zerovec \amp \inv{(Q')}A'Q' \end{array}\right]\\ \amp = \left[\begin{array}{@{}c|c@{}} \lambda_1 \amp BQ' \\ \hline \zerovec \amp \begin{matrix} U'_1 \\ \amp U'_2 \\ \amp \amp \ddots \\ \amp \amp \amp U'_\ell \end{matrix} \end{array}\right]\\ \amp = \left[\begin{array}{@{}c|c|c@{}} \lambda_1 \amp C_1 \amp C_2 \\ \hline \zerovec \amp U'_1 \amp \zerovec \\ \hline \zerovec \amp \zerovec \amp \begin{matrix} U'_2 \\ \amp \amp \ddots \\ \amp \amp \amp U_\ell' \end{matrix} \end{array}\right]\text{.} \end{align*}

In the last matrix above, we have broken the row matrix \(B Q'\) up into row matrices \(C_1\) and \(C_2\text{,}\) where \(C_1\) has size \(1 \times (m_1-1)\text{.}\) This is almost in triangular-block form — the remaining difficulty is that \(C_2\) might have nonzero entries. The remainder of the proof revolves around showing that one can choose the matrix \(Q'\) so that \(C_2 = \zerovec\text{;}\) see [5] for the details.

Finally, note that the blocks of \(\inv{P} A P\) have size \(m_1,m_2,\dotsc,m_\ell\text{,}\) respectively.

Remark 30.5.2.
  • If our field of scalars is \(\C\text{,}\) then The Fundamental Theorem of Algebra (Complex Version) tells us that the characteristic polynomial of every matrix factors completely. However, if our field of scalars is \(\R\text{,}\) then the characteristic polynomials of some matrices will contain irreducible quadratic factors.
  • Notice that in triangular-block form, to each distinct eigenvalue of \(A\) there corresponds precisely one block in the scalar-triangular form, with the eigenvalue down the diagonal, and of size equal to the algebraic multiplicity of the eigenvalue.

In the case of a matrix with a single repeated eigenvalue, as in Chapter 29, we can be more specific.

In this case, the characteristic polynomial must factor as

\begin{equation*} c(\lambda) = (\lambda - \lambda_0)^n \text{,} \end{equation*}

where \(\lambda_0\) is the single repeated eigenvalue, and so Theorem 30.5.1 tells us that the matrix is similar to a triangular block matrix with a single scalar-triangular block.

Subsection 30.5.2 Properties of generalized eigenvectors

We previously stated some properties of generalized eigenspaces as Theorem 29.6.1, and we are now in a position to prove Statement 4 of that theorem. As well, we know from Chapter 29 that it is generalized eigenspaces that produce matrices in scalar-triangular form. All we need further is to establish that the generalized eigenspaces of a matrix satisfy the requirements of the block-diagonalization procedure: invariance and independence.

Suppose \(A\) is a square matrix, \(\lambda\) is an eigenvalue of \(A\text{,}\) and \(k\) is a positive integer. We would like to verify that the generalized eigensubspace \(E_\lambda^k(A)\) is \(A\)-invariant.

First, note that the matrices \((\lambda I - A)\) and \(A\) commute, since

\begin{equation*} (\lambda I - A)A = \lambda A - A^2 = A(\lambda I - A) \text{.} \end{equation*}

From this, we can conclude that \(A\) commutes with \((\lambda I - A)^k\text{,}\) since

\begin{align*} (\lambda I - A)^k A \amp = (\lambda I - A)^{k-1} A (\lambda I - A)\\ \amp = (\lambda I - A)^{k-2} A (\lambda I - A)^2\\ \amp \vdots\\ \amp = (\lambda I - A) A (\lambda I - A)^{k-1}\\ \amp = A (\lambda I - A)^k\text{.} \end{align*}

Now, to show that \(E_\lambda^k(A)\) is \(A\)-invariant, we must show that, given vector \(\uvec{u}\) in \(E_\lambda^k(A)\text{,}\) the vector \(A\uvec{u}\) is again in \(E_\lambda^k(A)\text{.}\) So assume that \(\uvec{u} \in E_\lambda^k(A)\text{.}\) By definition, this means that \((\lambda I - A)^k \uvec{u} = \zerovec\text{.}\) But then, using the fact that \(A\) commutes with \((\lambda I - A)^k\) proved above, we have

\begin{equation*} (\lambda I - A)^k A \uvec{u} = A (\lambda I - A)^k \uvec{u} = A \zerovec = \zerovec \text{.} \end{equation*}

Since \((\lambda I - A)^k (A \uvec{u}) = \zerovec\text{,}\) we have that \(A\uvec{u}\) in \(E_\lambda^k(A)\) as well, as desired.

We already know that \(G_\lambda(A)\) is a subspace of \(\R^n\) or \(\C^n\text{,}\) as appropriate (Statement 1 of Theorem 29.6.1). So we only need to verify that it is \(A\)-invariant.

If \(\uvec{x}\) is a vector in the generalized eigenspace \(G_\lambda(A)\text{,}\) then by definition it is in a generalized eigensubspace \(E_\lambda^k(A)\) for some \(k\text{.}\) But we have already shown that each \(E_\lambda^k(A)\) is \(A\)-invariant (Lemma 30.5.4), so \(A\uvec{x}\) must again be back in \(E_\lambda^k(A)\text{,}\) which puts it back in \(G_\lambda(A)\text{.}\)

We give a proof for \(A\) a real matrix, and assuming real scalars. The proof for the complex context is identical.

By Theorem 30.5.1, there exists an invertible matrix \(P\) such that \(\inv{P} A P\) is in triangular-block form. From the block form of \(\inv{P} A P\text{,}\) confirm that

\begin{align*} G_{\lambda_1}(\inv{P} A P) \amp = \Span \{ \uvec{e}_1, \dotsc, \uvec{e}_{m_1} \} \text{,} \\ G_{\lambda_2}(\inv{P} A P) \amp = \Span \{ \uvec{e}_{m_1+1}, \dotsc, \uvec{e}_{m_1+m_2} \} \text{,} \\ \amp \vdots \text{,} \\ G_{\lambda_\ell}(\inv{P} A P) \amp = \Span \{ \uvec{e}_{n - m_\ell-1}, \dotsc, \uvec{e}_{n} \} \text{,} \end{align*}

where each spanning set above is obviously linearly independent.

As usual, the transition matrix \(P\) transfers properties of \(\inv{P} A P\) to the equivalent properties of \(A\text{.}\) In particular, it can be shown that the above spanning sets for the generalized eigenspaces of \(\inv{P} A P\) can be transferred to spanning sets for the generalized eigenspaces of \(A\text{:}\)

\begin{align*} G_{\lambda_1}(A) \amp = \Span \{ P\uvec{e}_1, \dotsc, P\uvec{e}_{m_1} \} \text{,} \\ G_{\lambda_2}(A) \amp = \Span \{ P\uvec{e}_{m_1+1}, \dotsc, P\uvec{e}_{m_1+m_2} \} \text{,} \\ \amp \vdots \\ G_{\lambda_\ell}(A) \amp = \Span \{ P\uvec{e}_{n - m_\ell-1}, \dotsc, P\uvec{e}_{n} \} \text{.} \end{align*}

Since \(P\) is invertible, each of these transformed spanning sets is also linearly independent (Statement 2 of Proposition 21.5.1). Thus, \(\dim G_{\lambda_j}(A) = m_j\text{.}\) Similarly, since the standard basis of \(\R^n\) is linearly independent, and \(P\) is invertible, taking all these spanning sets together forms a basis of \(\R^n\text{.}\)