Skip to main content

Section 33.4 Concepts

Subsection 33.4.1 From elementary to block nilpotent form

Just as scalar-triangular form is the single-block case of triangular block form, elementary nilpotent form is the single-block case of a multi-block nilpotent form that we will call triangular-block nilpotent form. Elementary nilpotent form can be achieved when the nilpotent matrix has maximum degree of nilpotency. If the degree of nilpotency is less than maximum, then the matrix must be “built” out of smaller elementary nilpotent blocks that become zero earlier.

Justifying the theory behind the pattern of similarity for triangular-block nilpotent form is beyond the scope of this book, but we can recognize the pattern by combining the patterns for block-diagonal form and elementary nilpotent form. A matrix is similar to one in block diagonal form when there exists a complete set of independent, invariant subspaces of \(\R^n\) (or \(\C^n\text{,}\) as appropriate). Each of those blocks will be in elementary nilpotent form when the corresponding subspace is cyclic with a spanning set that terminates in the zero vector, i.e. the last vector in each cyclic basis for the independent subspaces must be in the null space of the matrix. Since cyclic spaces are always invariant (Proposition 32.6.1), we can instead say that we need a complete set of independent, cyclic subspaces with each cyclic basis terminating in the null space of the matrix.

We can be even a little more specific. We need a cyclic basis for each of these subspaces, but we also need the collection of subspaces to be independent. By definition, this means that we need the collection of cyclic bases to be independent when taken all together. In particular, the collection of final null space vectors from each cyclic basis must be independent when taken all together.

Pattern 33.4.1. Similarity with a triangular-block nilpotent matrix.

For \(n \times n\) matrix \(A\) to be similar to a triangular-block nilpotent matrix \(N\) via similarity relation \(\inv{P} A P\text{,}\) the columns of the transition matrix \(P\) must form a collection of cyclic bases for a complete set of independent, \(A\)-cyclic subspaces of \(\R^n\) (or \(\C^n\text{,}\) as appropriate).

Furthermore, if the final vector is chosen from each of these \(A\)-cyclic bases, this collection of final vectors must be a linearly independent subset of the null space of \(A\text{.}\)

Just as in Discovery 33.4, the condition on the final null space vectors included in Pattern 33.4.1 actually guarantees that the collection of subspaces will be independent (Proposition 33.6.4), which reduces the conditions to be checked even further.

Subsection 33.4.2 Determining the form indirectly

In Subsection 32.4.5, we listed two important properties of elementary nilpotent form:

  • powers of elementary nilpotent form only become zero at the size of the matrix; and
  • the ranks of powers of elementary nilpotent form decrease by one for each increase in exponent, beginning at rank \(n - 1\) at exponent \(1\) and ending at rank \(0\) at exponent \(n\text{.}\)

Each block in a triangular-block nilpotent matrix will exhibit the same properties, and each of these properties is preserved by similarity. So for a given nilpotent matrix \(A\text{,}\) we can determine exactly what form of triangular-block nilpotent matrix that \(A\) is similar to simply by considering the degree of nilpotency of \(A\) and the ranks of its powers.

In Discovery 33.1, we carried out this task for a specific form of triangular-block form matrix, but then used our observations to imagine how we could reverse-engineer the form matrix.

The number of elementary blocks.

Suppose that \(A\) is an \(n \times n\) nilpotent matrix, and is similar to triangular-block form matrix \(N\text{.}\) Our first observation in Subsection 32.4.5 was that the nullity of \(A\) tells us the number of blocks in \(N\text{.}\) The identity matrix with its diagonal of ones down the main diagonal has full rank \(n\text{.}\) If \(N\) were actually in elementary nilpotent form, effectively consisting of one large elementary block with a line of ones down the first sub-diagonal, then \(\rank N\) would be one less than full at \(n - 1\text{.}\) If \(N\) instead contains \(\ell\) blocks, then each of those blocks is deficient from full rank by one, and so \(\rank N\) would be \(n - \ell\text{.}\) The Rank-Nullity Theorem tells us that rank and nullity add up to the size of the matrix, so the nullity must be \(\ell\text{.}\) And since similar \(A\) and \(N\) share the same nullity, the nullity of \(A\) tells the number of elementary nilpotent blocks in \(N\).

The size of the largest block.

We may compute powers of a block diagonal matrix by just computing powers of the blocks. And we know that powers of an elementary nilpotent block in \(N\) won't become zero until the size of the block. Since the smaller blocks become zero at smaller exponents, \(N\) itself will become zero precisely when its largest blocks do. And since similar \(A\) and \(N\) share degree of nilpotency (Proposition 31.5.2), we can say that the degree of nilpotency of \(A\) determines the size of the largest block in \(N\).

The number of blocks of largest size.

It is certainly possible for \(N\) to have multiple blocks of the same size. Now that we know how to determine the size of the largest block, we can ask: how many blocks of that size does \(N\) have? Suppose the largest size of block in \(N\) is \(k \times k\text{.}\) Then from our discussion above, we will see \(N^k = \zerovec\) but \(N^{k-1} \neq \zerovec\text{.}\) In \(N^{k - 1}\text{,}\) all blocks of size smaller than \(k\) have still been taken to zero by the exponent \(k - 1\text{.}\) What will appear in the powers blocks of largest size to make \(N^{k - 1} \neq \zerovec\) true? Backing the exponent off by one from the degree of nilpotency of the largest blocks will cause a single one to appear in the lower left entry of each block of largest size. We can use these solitary ones to count the blocks of largest size, but again transfer the property by which we will count them from \(N\) to \(A\) via similarity: if \(k\) is the size of the largest blocks in \(N\text{,}\) then \(\rank A^{k-1}\) tells the number of these blocks of largest size.

The number and sizes of smaller blocks.

We can continue the reasoning in our analysis of the blocks of largest size. And just as in that analysis, we can think in terms of the blocks of \(N\) but compute in terms of \(A\text{.}\) Again let \(k\) represent the size of the largest block in \(N\text{,}\) and let \(m\) represent the number of these blocks. We then have \(\rank A^k = 0\) and \(\rank A^{k - 1} = m\text{.}\) If we back the exponent off one more to \(N^{k - 2}\text{,}\) the line of sub-diagonal ones in \(N\) will start reappearing in each block of size \(k\text{,}\) and we expect the rank (of both \(N\) and \(A\)) to increase by one in each of those blocks. And this continues with every step that we decrease the exponent. That is, we expect

\begin{equation*} \rank A^{k - 2} = 2m, \quad \rank A^{k-3} = 3m, \quad \rank A^{k-4} = 4m, \dotsc \text{.} \end{equation*}

However, this pattern will be broken as soon as the block of next largest size in \(N\) “reappears” with a solitary one in its lower left corner. So if \(\rank A^{k-j}\) is larger than the pattern above predicts, we must have a new block of smaller size reappearing in \(N^{k-j}\text{.}\) Since boosting the exponent back up to \(k - j + 1\) would make that block disappear again, that exponent value must be the size of that next largest block. And since we will get a new solitary block-corner \(1\) in \(N^{k-j}\) in each such block, the amount that \(\rank A^{k-j}\) is larger than the pattern above would have predicted tells the number of blocks of that size.

From here, we can make a new “expected” pattern of rank increases in \(A\) based on the sizes and numbers of blocks in \(N\) that we already know about, and any new jump in rank to a value larger than the pattern predicts tells the existence, size, and number of occurrences of new blocks in \(N\text{.}\)

See Example 33.5.1 for an example of using this kind of indirect information gathering.

Subsection 33.4.3 Procedures for any nilpotent matrix

For the sake of completeness, here are two different procedures that will put any nilpotent matrix into triangular-block nilpotent form.

The first procedure takes a “bottom-up” approach, where we obtain cyclic subspaces for each block of a certain size, one block size at a time, from largest blocks to smallest.

The procedure relies on the fact that null spaces of powers of a matrix are nested. That is, if we write \(\null(M)\) to mean the null space of any square matrix \(M\text{,}\) then we have

\begin{equation*} \null(A) \subseteq \null(A^2) \subseteq \null(A^3) \subseteq \dotsb \subseteq \null(A^k) \subseteq \dotsb \text{.} \end{equation*}

This nesting occurs because for every \(\uvec{x}\) that satisfies \(A^k \uvec{x} = \zerovec\text{,}\) then

\begin{equation*} A^{k+1} \uvec{x} = A (A^k \uvec{x}) = A \zerovec = \zerovec \end{equation*}

as well.

As all of these null spaces are subspaces of a finite-dimensional vector space, there has to be a point where the sequence of null spaces becomes constant. That is, there exists a \(k\) so that in fact \(\null (A^m) = \null (A^k)\) for all \(m \ge k\text{.}\) In the case that \(A\) is nilpotent, this occurs at the degree of nilpotency, since if \(A^k = \zerovec\text{,}\) then

\begin{equation*} \null (A^k) = \null (\zerovec) = \R^n \end{equation*}

(or \(\C^n\text{,}\) if we are working with complex matrices).

Now, suppose \(A\) is similar to triangular-block nilpotent form matrix \(N\text{.}\) Each nilpotent block \(N_j\) in \(N\) of size \(m_j \times m_j\) corresponds to an \(A\)-cyclic subspace of dimension \(m_j\text{.}\) Such a subspace needs to be generated by some vector \(\uvec{w}_j\) so that \(A^{m_j} \uvec{w}_j = \zerovec\) but \(A^{m_j-1} \uvec{w}_j \neq \zerovec\text{.}\) (See Theorem 32.6.3.) That is, \(\uvec{w}_j\) needs to be in the null space of \(A^{m_j}\) but not in the null space of \(A^{m_j-1}\text{.}\)

The second procedure takes a “top-down” approach, where first we assemble a collection of (likely not independent) cyclic subspaces whose cyclic bases, when taken all together, at least span \(\R^n\) (or \(\C^n\text{,}\) as appropriate). Then we whittle this collection down until we have an independent collection that still spans the full space (\(\R^n\) or \(\C^n\)).

Remark 33.4.4.

Unless \(A\) is the zero matrix, there is no way the collection of \(A\)-cyclic spaces generated by the standard basis vectors obtained in the first step of Procedure 33.4.3 can be independent, because the dimensions of these spaces will add up to more than \(n\text{.}\) The point of this procedure is to take this collection as a starting point, and reduce it down to an independent collection by discarding dependent spaces, just as we might reduce a dependent spanning set down to a basis for a space by discarding dependent vectors. The main criteria for deciding which spaces to discard is dependence of the final null space vectors in each \(A\)-cyclic basis, as well as a preference for larger \(A\)-cyclic spaces over smaller ones.