Section 31.5 Theory
In this section.
Subsection 31.5.1 Nilpotent matrices
First, we establish the pattern we've seen so many times in this chapter already.
Lemma 31.5.1. Scalar-triangular with \(\lambda = 0\) is nilpotent.
A scalar-triangular matrix \(A\) with \(\lambda = 0\) down the main diagonal will satisfy \(A^n = \zerovec\text{,}\) where \(n\) is the size of \(A\text{.}\) In particular, such a matrix is always nilpotent.
Proof outline.
We omit a detailed proof, relying on Example 31.4.3 to demonstrate the pattern.
However, a proof by induction is fairly straightforward, as a matrix of this form can be broken up into a block matrix
where \(B\) is an \((n-1) \times (n-1)\) matrix of the same form, \(R\) is a \(1 \times (n-1)\) row vector, and \(\zerovec\) represents an \((n-1) \times 1\) zero vector. Using the pattern of block multiplication from Subsection 28.4.1, one can calculate
for any positive exponent \(k\text{,}\) and from there the result follows readily using the appropriate induction hypothesis.
Despite the lemma above, keep Example 31.4.4 in mind here — a matrix does not have to be triangular to be nilpotent.
The next proposition should not be a surprise, since we know that similar matrices are effectively the same matrix, just “expressed” in different coordinate systems.
Proposition 31.5.2. Similar to nilpotent is nilpotent of same degree.
A matrix that is similar to a nilpotent one must itself also be nilpotent, and the two similar matrices must have the same degree of nilpotency.
Outline of proof.
Suppose we have a pair of similar matrices. By Proposition 26.5.4, these two matrices also have similar powers. Since the only matrix that is similar to the zero matrix is the zero matrix itself, if a power of one of the two matrices becomes zero at a certain power, then so will the other.
Now we will record some characterizations of nilpotency. As each of these statements is equivalent to nipotency, we can use these statements both as properties of nilpotent matrices and as ways to recognize that a matrix is nilpotent.
Theorem 31.5.3. Characterizations of nilpotency.
For a square matrix \(A\text{,}\) the following are equivalent.
-
Nilpotency.
Matrix \(A\) is nilpotent. -
Eigenvalues and characteristic polynomial.
Matrix \(A\) has only a single eigenvalue \(\lambda = 0\text{,}\) and its characteristic polynomial must be \(c_A(\lambda) = \lambda^n\text{.}\)
-
Triangular block form.
Matrix \(A\) is similar to a scalar-triangular matrix with \(\lambda = 0\) down the main diagonal.
-
Upper bound for degree of nilpotency.
Matrix \(A\) satisfies \(A^n = \zerovec\text{.}\)
Proof.
We will show that each statement of the theorem implies the next.
Whenever Statement 1 is true, then so is Statement 2.
Suppose \(A\) is a square matrix, \(\lambda = \lambda_0\) is an eigenvalue of \(A\text{,}\) and \(\uvec{x}\) is a (nonzero) eigenvector of \(A\) corresponding to \(\lambda_0\text{.}\) For every positive exponent \(k\text{,}\) we have
But if \(A\) is nilpotent with \(A^k = \zerovec\) for some \(k\text{,}\) then the above becomes
Since \(\uvec{x}\) is assumed nonzero, we must have \(\lambda_0^k = 0\) (Rule 1.e of Proposition 16.6.2), which forces \(\lambda_0 = 0\) as well.
We have established that when \(A\) is nilpotent, it cannot have any eigenvalues, real or complex, besides \(\lambda = 0\text{.}\) The assertion that \(c_A(\lambda) = \lambda^n\) immediately follows.
Whenever Statement 2 is true, then so is Statement 3.
This is Corollary 30.5.3 with \(\lambda = 0\text{.}\)
Whenever Statement 3 is true, then so is Statement 4.
Suppose there exists invertible \(P\) so that \(\inv{P} A P = N\text{,}\) where \(N\) is scalar-triangular matrix with \(\lambda = 0\) down the main diagonal. Lemma 31.5.1 tells that such a matrix \(N\) will satisfy \(N^n = \zerovec\text{.}\) Combined with the identity
we may conclude that \(A^n = \zerovec\) as well, since the only matrix that is similar to the zero matrix is the zero matrix itself.
Whenever Statement 4 is true, then so is Statement 1.
If we assume \(A\) satisfies \(A^n = \zerovec\text{,}\) then by definition we are assuming that \(A\) is, in fact, a nilpotent matrix.
Remark 31.5.4.
- As we have seen in Example 31.4.4, a matrix does not have to be scalar-triangular with zeros down its diagonal in order to be nilpotent. But Statement 3 of Theorem 31.5.3 does say that a matrix has to at least be similar to such a scalar-triangular matrix.
- Statement 4 of Theorem 31.5.3 does not say that every nilpotent \(n \times n\) matrix has degree of nilpotency equal to \(n\text{.}\) Instead, it tells us that degree of nilpotency will be no greater than \(n\text{.}\)
Finally, a matrix whose powers eventually become zero does not sound like it could be invertible, and that is the case.
Corollary 31.5.5. Nilpotent is singular.
A nilpotent matrix is always singular.
Proof.
Combine Statement 2 of Theorem 31.5.3 with the equivalence of Statement 1 and Statement 13 in Theorem 24.6.3.
Subsection 31.5.2 The Cayley-Hamilton Theorem
First, we establish the similarity pattern of matrix polynomials explored in Task 31.1.b and discussed in Subsection 31.3.1.
Lemma 31.5.6. Similar matrices evaluate to similar results.
Suppose \(p(x)\) is a polynomial and \(A\) and \(P\) are square matrices of the same size with \(P\) invertible. Then,
Proof outline.
The case that \(p(x)\) consists of just a power of \(x\text{,}\) so that \(p(x) = x^k\text{,}\) is proved exactly as in the proof of Proposition 26.5.4. And then the case of a general polynomial
follows from single-power case, and is left as an exercise for you, the reader.
Now we'll record and prove the main result of this chapter.
Theorem 31.5.7. Cayley-Hamilton Theorem.
For every square matrix \(A\text{,}\) \(c_A(A) = \zerovec\text{.}\)
Proof.
As in Discovery 31.4, first we will prove the statement in the case that \(A\) is a matrix in triangular block form. The eigenvalues of such a matrix are precisely the diagonal entries, and each algebraic multiplicity is equal to the size of the corresponding scalar-triangular block along the block diagonal. So the characteristic polynomial of \(A\) will be of the form
where we assume that the linear factors appear in the same order according to eigenvalue that the scalar-triangular blocks appear down the block diagonal of \(A\text{.}\) Evaluating \(c_A(\lambda)\) at \(A\) gives
Now, each linear expression
will also be in triangular block form, but the blocks will now have the differences of eigenvalues against the specific eigenvalue \(\lambda_j\) down their main diagonals. In particular, the \(\nth[j]\) scalar-triangular block will have zero repeated down the main diagonal. And since that \(\nth[j]\) block has size equal to the corresponding algebraic multiplicity \(m_j\text{,}\) raising it to the exponent \(m_j\) will result in the zero matrix (Lemma 31.5.1). Applying Statement 1 of Proposition 28.6.1, each factor
will have a zero block, and these will appear in descending positions down the block diagonal from factor to factor as \(j\) increases from \(1\) to \(\ell\text{.}\) Multiplying these factors together is achieved by multiplying together all the corresponding block matrices from the block diagonals, and this will result in zero for each block position since each such product will have a zero factor in it. Therefore, \(c_A(A) = \zerovec\text{.}\)
Now we proceed as in Discovery 31.5. Suppose that \(A\) is a general square matrix, not necessarily in any special form. Even if \(A\) is a real matrix, we will consider it to be a complex matrix (every real number is also a complex number!), and we will work over the field of scalars \(\C\text{.}\) The Fundamental Theorem of Algebra (Complex Version) implies that every polynomial, considered as a complex polynomial, factors completely into linear factors. (See Corollary A.2.18.) Therefore, the characteristic polynomial of \(A\) can be factored completely as in (\(\star\)) above, and so Theorem 30.5.1 can be applied to obtain a matrix \(T\) in triangular block form to which \(A\) is similar. From the first case above, we know
But similar matrices have the same characteristic polynomial (Theorem 26.5.8) and evaluate to similar results in polynomials (Lemma 31.5.6), so \(c_A(A) \) is similar to \(c_T(T) = \zerovec \text{.}\) However, the only matrix that is similar to the zero matrix is the zero matrix itself, so we can conclude that
Subsection 31.5.3 More consequences of triangular block form
Finally, we'll record the patterns we discovered in Discovery 31.6.
Theorem 31.5.8. Determinant and trace versus characteristic polynomial and eigenvalues.
Suppose \(A\) is an \(n \times n\) matrix.
- The determinant of \(A\) is equal to the product of its eigenvalues, both real and complex, and with each eigenvalue repeated in the product according to its algebraic multiplicity.
- The constant term in the characteristic polynomial of \(A\) is always \((-1)^n \det A\text{.}\)
- The trace of \(A\) is equal to the sum of its eigenvalues, both real and complex, and with each eigenvalue repeated in the sum according to its algebraic multiplicity.
- The \(\lambda^{n-1}\) term in the characteristic polynomial of \(A\) always has coefficient \(- \trace A\text{.}\)
Proof.
Since similar matrices have the same determinant, trace, characteristic polynomial, and eigenvalues, and since every matrix is similar (over \(\C\)) to a matrix in triangular block form, it suffices to consider the case of triangular block form. So suppose \(A\) is a triangular block matrix, and its characteristic polynomial factors as a complex polynomial into
The \(\nth[j]\) block of \(A\) is scalar-triangular with determinant equal to \(\lambda_j^{m_j}\text{,}\) the product of its diagonal entries. Applying Statement 5 of Proposition 28.6.1, we immediately get
Determining the constant term of \(c_A(\lambda)\) is as easy as evaluating the polynomial at \(\lambda = 0\text{,}\) which yields
where
because the degree of a characteristic polynomial is always equal to the size of the matrix.
Similarly, the trace of the \(\nth[j]\) block of \(A\) is equal to \(m_j \lambda_j\text{,}\) the sum of its diagonal entries. The trace of \(A\) is then clearly equal to the sum of these sums of diagonal entries, so that
To investigate the coefficient on \(\lambda^{n-1}\) in (\(\star\star\)), first consider this expression as
where each factor is repeated according to algebraic multiplicity. Similar to how the Binomial Theorem is investigated, in expanding the above expression we get a term involving \(\lambda^{n-1}\) by choosing \(\lambda\) from all but one of the linear factors and choosing \((-\lambda_j)\) from the remaining factor, and multiplying all these choices together. Each such product will be equal to \(-\lambda_j \lambda^{n-1}\text{,}\) and we get \(m_j\) such terms for each eigenvalue because there are \(m_j\) linear factors involving that eigenvalue to end up as the “remaining factor” in our choices. Collecting these like terms, the coefficient on the term involving \(\lambda^{n-1}\) is