Skip to main content

Section 31.5 Theory

Subsection 31.5.1 Nilpotent matrices

First, we establish the pattern we've seen so many times in this chapter already.

We omit a detailed proof, relying on Example 31.4.3 to demonstrate the pattern.

However, a proof by induction is fairly straightforward, as a matrix of this form can be broken up into a block matrix

\begin{equation*} A = \left[\begin{array}{@{}c|c@{}} 0 \amp R \\ \hline \zerovec \amp B \end{array}\right] \text{,} \end{equation*}

where \(B\) is an \((n-1) \times (n-1)\) matrix of the same form, \(R\) is a \(1 \times (n-1)\) row vector, and \(\zerovec\) represents an \((n-1) \times 1\) zero vector. Using the pattern of block multiplication from Subsection 28.4.1, one can calculate

\begin{equation*} A^k = \left[\begin{array}{@{}c|c@{}} 0 \amp RB^{k-1} \\ \hline \zerovec \amp B^k \end{array}\right] \end{equation*}

for any positive exponent \(k\text{,}\) and from there the result follows readily using the appropriate induction hypothesis.

Despite the lemma above, keep Example 31.4.4 in mind here — a matrix does not have to be triangular to be nilpotent.

The next proposition should not be a surprise, since we know that similar matrices are effectively the same matrix, just “expressed” in different coordinate systems.

Suppose we have a pair of similar matrices. By Proposition 26.5.4, these two matrices also have similar powers. Since the only matrix that is similar to the zero matrix is the zero matrix itself, if a power of one of the two matrices becomes zero at a certain power, then so will the other.

Now we will record some characterizations of nilpotency. As each of these statements is equivalent to nipotency, we can use these statements both as properties of nilpotent matrices and as ways to recognize that a matrix is nilpotent.

We will show that each statement of the theorem implies the next.

Whenever Statement 1 is true, then so is Statement 2.

Suppose \(A\) is a square matrix, \(\lambda = \lambda_0\) is an eigenvalue of \(A\text{,}\) and \(\uvec{x}\) is a (nonzero) eigenvector of \(A\) corresponding to \(\lambda_0\text{.}\) For every positive exponent \(k\text{,}\) we have

\begin{equation*} A^k \uvec{x} = \lambda_0^k \uvec{x} \text{.} \end{equation*}

But if \(A\) is nilpotent with \(A^k = \zerovec\) for some \(k\text{,}\) then the above becomes

\begin{equation*} \zerovec = \lambda_0^k \uvec{x} \text{.} \end{equation*}

Since \(\uvec{x}\) is assumed nonzero, we must have \(\lambda_0^k = 0\) (Rule 1.e of Proposition 16.6.2), which forces \(\lambda_0 = 0\) as well.

We have established that when \(A\) is nilpotent, it cannot have any eigenvalues, real or complex, besides \(\lambda = 0\text{.}\) The assertion that \(c_A(\lambda) = \lambda^n\) immediately follows.

Whenever Statement 2 is true, then so is Statement 3.

This is Corollary 30.5.3 with \(\lambda = 0\text{.}\)

Whenever Statement 3 is true, then so is Statement 4.

Suppose there exists invertible \(P\) so that \(\inv{P} A P = N\text{,}\) where \(N\) is scalar-triangular matrix with \(\lambda = 0\) down the main diagonal. Lemma 31.5.1 tells that such a matrix \(N\) will satisfy \(N^n = \zerovec\text{.}\) Combined with the identity

\begin{equation*} {(\inv{P} A P)}^n = \inv{P} A^n P \text{,} \end{equation*}

we may conclude that \(A^n = \zerovec\) as well, since the only matrix that is similar to the zero matrix is the zero matrix itself.

Whenever Statement 4 is true, then so is Statement 1.

If we assume \(A\) satisfies \(A^n = \zerovec\text{,}\) then by definition we are assuming that \(A\) is, in fact, a nilpotent matrix.

Remark 31.5.4.
  1. As we have seen in Example 31.4.4, a matrix does not have to be scalar-triangular with zeros down its diagonal in order to be nilpotent. But Statement 3 of Theorem 31.5.3 does say that a matrix has to at least be similar to such a scalar-triangular matrix.
  2. Statement 4 of Theorem 31.5.3 does not say that every nilpotent \(n \times n\) matrix has degree of nilpotency equal to \(n\text{.}\) Instead, it tells us that degree of nilpotency will be no greater than \(n\text{.}\)

Finally, a matrix whose powers eventually become zero does not sound like it could be invertible, and that is the case.

Subsection 31.5.2 The Cayley-Hamilton Theorem

First, we establish the similarity pattern of matrix polynomials explored in Task 31.1.b and discussed in Subsection 31.3.1.

The case that \(p(x)\) consists of just a power of \(x\text{,}\) so that \(p(x) = x^k\text{,}\) is proved exactly as in the proof of Proposition 26.5.4. And then the case of a general polynomial

\begin{equation*} p(x) = a_0 + a_1 x + a_2 x^2 + \dotsb + a_m x^m \end{equation*}

follows from single-power case, and is left as an exercise for you, the reader.

Now we'll record and prove the main result of this chapter.

As in Discovery 31.4, first we will prove the statement in the case that \(A\) is a matrix in triangular block form. The eigenvalues of such a matrix are precisely the diagonal entries, and each algebraic multiplicity is equal to the size of the corresponding scalar-triangular block along the block diagonal. So the characteristic polynomial of \(A\) will be of the form

\begin{gather} c_A(\lambda) = (\lambda - \lambda_1)^{m_1} (\lambda - \lambda_2)^{m_2} \dotsm (\lambda - \lambda_\ell)^{m_\ell}\text{,}\label{equation-cayley-hamilton-fully-factored-char-poly}\tag{\(\star\)} \end{gather}

where we assume that the linear factors appear in the same order according to eigenvalue that the scalar-triangular blocks appear down the block diagonal of \(A\text{.}\) Evaluating \(c_A(\lambda)\) at \(A\) gives

\begin{equation*} c_A(A) = (A - \lambda_1 I)^{m_1} (A - \lambda_2 I)^{m_2} \dotsm (A - \lambda_\ell I)^{m_\ell} \text{.} \end{equation*}

Now, each linear expression

\begin{equation*} A - \lambda_j I \end{equation*}

will also be in triangular block form, but the blocks will now have the differences of eigenvalues against the specific eigenvalue \(\lambda_j\) down their main diagonals. In particular, the \(\nth[j]\) scalar-triangular block will have zero repeated down the main diagonal. And since that \(\nth[j]\) block has size equal to the corresponding algebraic multiplicity \(m_j\text{,}\) raising it to the exponent \(m_j\) will result in the zero matrix (Lemma 31.5.1). Applying Statement 1 of Proposition 28.6.1, each factor

\begin{equation*} (A - \lambda_j I)^{m_j} \end{equation*}

will have a zero block, and these will appear in descending positions down the block diagonal from factor to factor as \(j\) increases from \(1\) to \(\ell\text{.}\) Multiplying these factors together is achieved by multiplying together all the corresponding block matrices from the block diagonals, and this will result in zero for each block position since each such product will have a zero factor in it. Therefore, \(c_A(A) = \zerovec\text{.}\)

Now we proceed as in Discovery 31.5. Suppose that \(A\) is a general square matrix, not necessarily in any special form. Even if \(A\) is a real matrix, we will consider it to be a complex matrix (every real number is also a complex number!), and we will work over the field of scalars \(\C\text{.}\) The Fundamental Theorem of Algebra (Complex Version) implies that every polynomial, considered as a complex polynomial, factors completely into linear factors. (See Corollary A.2.18.) Therefore, the characteristic polynomial of \(A\) can be factored completely as in (\(\star\)) above, and so Theorem 30.5.1 can be applied to obtain a matrix \(T\) in triangular block form to which \(A\) is similar. From the first case above, we know

\begin{equation*} c_T(T) = \zerovec \text{.} \end{equation*}

But similar matrices have the same characteristic polynomial (Theorem 26.5.8) and evaluate to similar results in polynomials (Lemma 31.5.6), so \(c_A(A) \) is similar to \(c_T(T) = \zerovec \text{.}\) However, the only matrix that is similar to the zero matrix is the zero matrix itself, so we can conclude that

\begin{equation*} c_A(A) = \zerovec \text{.} \end{equation*}

Subsection 31.5.3 More consequences of triangular block form

Finally, we'll record the patterns we discovered in Discovery 31.6.

Since similar matrices have the same determinant, trace, characteristic polynomial, and eigenvalues, and since every matrix is similar (over \(\C\)) to a matrix in triangular block form, it suffices to consider the case of triangular block form. So suppose \(A\) is a triangular block matrix, and its characteristic polynomial factors as a complex polynomial into

\begin{gather} c_A(\lambda) = (\lambda - \lambda_1)^{m_1} (\lambda - \lambda_2)^{m_2} \dotsm (\lambda - \lambda_\ell)^{m_\ell}\text{.}\label{equation-cayley-hamilton-fully-factored-char-poly2}\tag{\(\star\star\)} \end{gather}

The \(\nth[j]\) block of \(A\) is scalar-triangular with determinant equal to \(\lambda_j^{m_j}\text{,}\) the product of its diagonal entries. Applying Statement 5 of Proposition 28.6.1, we immediately get

\begin{equation*} \det A = \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell} \text{.} \end{equation*}

Determining the constant term of \(c_A(\lambda)\) is as easy as evaluating the polynomial at \(\lambda = 0\text{,}\) which yields

\begin{align*} c_A(0) \amp = (- \lambda_1)^{m_1} (- \lambda_2)^{m_2} \dotsm (- \lambda_\ell)^{m_\ell} \\ \amp = (-1)^{m_1 + m_2 + \dotsb + m_\ell} \lambda_1^{m_1} \lambda_2^{m_2} \dotsm \lambda_\ell^{m_\ell} \\ \amp = (-1)^n \det A \text{,} \end{align*}

where

\begin{equation*} m_1 + m_2 + \dotsb + m_\ell = n \end{equation*}

because the degree of a characteristic polynomial is always equal to the size of the matrix.

Similarly, the trace of the \(\nth[j]\) block of \(A\) is equal to \(m_j \lambda_j\text{,}\) the sum of its diagonal entries. The trace of \(A\) is then clearly equal to the sum of these sums of diagonal entries, so that

\begin{equation*} \trace A = m_1 \lambda_1 + m_2 \lambda_2 + \dotsb + m_\ell \lambda_\ell \text{.} \end{equation*}

To investigate the coefficient on \(\lambda^{n-1}\) in (\(\star\star\)), first consider this expression as

\begin{equation*} c_A(\lambda) = \bigl[(\lambda - \lambda_1) \dotsm (\lambda - \lambda_1)\bigr] \dotsm \bigl[(\lambda - \lambda_\ell) \dotsm (\lambda - \lambda_\ell)\bigr] \text{,} \end{equation*}

where each factor is repeated according to algebraic multiplicity. Similar to how the Binomial Theorem is investigated, in expanding the above expression we get a term involving \(\lambda^{n-1}\) by choosing \(\lambda\) from all but one of the linear factors and choosing \((-\lambda_j)\) from the remaining factor, and multiplying all these choices together. Each such product will be equal to \(-\lambda_j \lambda^{n-1}\text{,}\) and we get \(m_j\) such terms for each eigenvalue because there are \(m_j\) linear factors involving that eigenvalue to end up as the “remaining factor” in our choices. Collecting these like terms, the coefficient on the term involving \(\lambda^{n-1}\) is

\begin{equation*} - m_1 \lambda_1 - m_2 \lambda_2 - \dotsb - m_\ell \lambda_\ell = - \trace A\text{.} \end{equation*}