Skip to main content

Section 29.4 Concepts

Subsection 29.4.1 Scalar-triangular form

A matrix in scalar-triangular form is reasonably simple: its rank, determinant, characteristic polynomial, and eigenvalues are immediately evident, and it is halfway to being row reduced.

The name of the form is significant beyond just describing the shape of the entries. A scalar-triangular matrix can be decomposed into a scalar matrix and a “purely” triangular matrix, both additively and multiplicatively.

First, an additive example:

\begin{equation*} \left[\begin{array}{ccccr} 3 \amp 4 \amp 1 \amp 4 \amp 1 \\ 0 \amp 3 \amp 5 \amp 11 \amp 1 \\ 0 \amp 0 \amp 3 \amp 6 \amp -10 \\ 0 \amp 0 \amp 0 \amp 3 \amp 7 \\ 0 \amp 0 \amp 0 \amp 0 \amp 3 \end{array}\right] = 3I + \left[\begin{array}{ccccr} 0 \amp 4 \amp 1 \amp 4 \amp 1 \\ 0 \amp 0 \amp 5 \amp 11 \amp 1 \\ 0 \amp 0 \amp 0 \amp 6 \amp -10 \\ 0 \amp 0 \amp 0 \amp 0 \amp 7 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \end{array}\right]\text{.} \end{equation*}

The second matrix in the decomposition on the right is called nilpotent. We will need this kind of decomposition later in our study of matrix forms.

Now, a multiplicative example:

\begin{equation*} \left[\begin{array}{ccccr} 3 \amp 4 \amp 1 \amp 4 \amp 1 \\ 0 \amp 3 \amp 5 \amp 11 \amp 1 \\ 0 \amp 0 \amp 3 \amp 6 \amp -10 \\ 0 \amp 0 \amp 0 \amp 3 \amp 7 \\ 0 \amp 0 \amp 0 \amp 0 \amp 3 \end{array}\right] = (3I) \cdot \begin{bmatrix} 1 \amp 4/3 \amp 1/3 \amp 4/3 \amp 1/3 \\ 0 \amp 1 \amp 5/3 \amp 11/3 \amp 1/3 \\ 0 \amp 0 \amp 1 \amp 2 \amp -10/3 \\ 0 \amp 0 \amp 0 \amp 1 \amp 7/3 \\ 0 \amp 0 \amp 0 \amp 0 \amp 1 \end{bmatrix}\text{.} \end{equation*}

The second matrix in the decomposition on the right is called unipotent. Note that this multiplicative decomposition is not possible when the scalar-triangular matrix has scalar \(0\) down the main diagonal — this observation just says that a nilpotent matrix does not have a unipotent part.

In this chapter, we have been (and will continue) concentrating on scalar-triangular form where the “triangular part” is upper triangular. There is no particular reason to prefer upper triangular over lower triangular — it is basically a technical preference to make the analysis cleaner. However, in a future chapter we will have reason to switch to lower triangular forms, again to make the subsequent analysis work out more naturally. And Discovery 29.5 demonstrates that the choice of upper or lower triangular form is ultimately irrelevant, as it is easy to switch between the two using a particular transition matrix.

Subsection 29.4.2 Generalized eigenvectors

Suppose \(\lambda\) is an eigenvalue of an \(n \times n\) matrix \(A\text{,}\) and \(\uvec{x}\) is a corresponding eigenvector in \(\R^n\text{.}\) By definition, this means that

\begin{equation*} A \uvec{x} = \lambda \uvec{x}\text{.} \end{equation*}

As in our analysis of eigenvalues and eigenvectors in Chapter 24, it is more convenient to rearrange this vector equation into the homogeneous condition

\begin{equation*} (\lambda I - A) \uvec{x} = \zerovec\text{,} \end{equation*}

so that once we have determined the eigenvalues from the characteristic equation, we can determine eigenvectors by row reducing. Furthermore, this homogeneous point of view immediately tells us that the collection of eigenvectors for a particular eigenvalue (along with the zero vector) forms a subspace of \(\R^n\text{,}\) since the eigenspace of \(A\) corresponding to \(\lambda\) is the same as the null space of the matrix \(\lambda I - A\text{.}\)

We saw in Discovery 29.2 (and will see again in our re-analysis of that discovery activity in Subsection 29.4.3 below), that the scalar-triangular similarity pattern leads to consideration of the null spaces of powers of the matrix \(\lambda I - A\text{.}\) We have named each new null space of \((\lambda I - A)^k\) the generalized eigensubspace of degree \(k\) for \(A\text{,}\) where the usual eigenspace of \(A\) is the same as the generalized eigensubspace of \(A\) of degree \(1\text{.}\)

These subspaces form a nested sequence

\begin{equation*} E_\lambda^1(A) \subseteq E_\lambda^2(A) \subseteq E_\lambda^3(A) \subseteq \dotsb \text{,} \end{equation*}

since a vector in the null space of \((\lambda I - A)^k\) must also lie in the null space of \((\lambda I - A)^{k+1}\text{.}\) These eigensubspaces cannot keep growing in dimension indefinitely, though, since they are all contained in \(\R^n\text{,}\) so at some point we will have

\begin{equation*} E_\lambda^K(A) = E_\lambda^{K+1}(A) = E_\lambda^{K+2}(A) = \dotsb \text{.} \end{equation*}

This largest eigensubspace is called the generalized eigenspace for \(A\) corresponding to \(\lambda\text{.}\)

We have seen in Chapter 25 that the algebraic multiplicity of an eigenvalue acts as an upper bound for the geometric multiplicity (Theorem 25.6.8). We will see that the dimension of the generalized eigenspace of an eigenvalue realizes this upper bound (Theorem 29.6.1 in Section 29.6), so that the generalized eigensubspaces somehow measure by how much an eigenvalue is “defective” from the ideal of having equal algebraic and geometric multiplicities.

Subsection 29.4.3 The similarity pattern of scalar-triangular form

Rather than work in the abstract, let's go back over Discovery 29.2, in which we were considering a matrix \(A\) in the same similarity class as a scalar-triangular matrix

\begin{equation*} B = \begin{bmatrix} 3 \amp \ast \amp \ast \amp \ast \amp \ast \\ 0 \amp 3 \amp \ast \amp \ast \amp \ast \\ 0 \amp 0 \amp 3 \amp \ast \amp \ast \\ 0 \amp 0 \amp 0 \amp 3 \amp \ast \\ 0 \amp 0 \amp 0 \amp 0 \amp 3 \end{bmatrix}\text{,} \end{equation*}

where the \(\ast\) entries are irrelevant to the form. Our first observation is that, like \(B\text{,}\) matrix \(A\) must also have the single eigenvalue \(\lambda = 3\text{,}\) of algebraic multiplicity \(5\text{.}\)

As usual, write \(\uvec{p}_j\) for the columns of the required transition matrix \(P\text{.}\) In this case, \(P\) must be \(5 \times 5\text{.}\) Using the general similarity pattern described in Subsection 26.3.2, the first column of \(B\) tells us that

\begin{equation*} A \uvec{p}_1 = 3 \uvec{p}_1 \text{,} \end{equation*}

which is the eigenvalue-eigenvector pattern. This should not be surprising, since the first column of \(B\) has the same form as the first column of a diagonal matrix, and we also encountered the eigenvalue-eigenvector pattern in our study of diagonal form.

After row reducing \(3 I - A\) as part of Task 29.2.a, we found that the dimension of the eigenspace \(E_3(A)\) was \(2\text{,}\) so in fact two linearly independent eigenvectors were available. Could we take both \(\uvec{p}_1\) and \(\uvec{p}_2\) to be eigenvectors? The second column of \(B\) says that we need

\begin{equation*} A\uvec{p}_2 = \ast \uvec{p}_1 + 3 \uvec{p}_2 \text{,} \end{equation*}

but since we don't care what that \(\ast\) value ends up as, taking it to be zero so that the condition becomes the

\begin{equation*} A \uvec{p}_2 = 3 \uvec{p}_2 \end{equation*}

eigenvalue-eigenvector pattern is fine.

Now, it is definitely not possible in this example to take \(\uvec{p}_3\) to be an eigenvector, because we need \(P\) to be invertible, hence we need its columns to be linearly independent (Theorem 21.5.5). And with dimension \(2\text{,}\) the eigenspace \(E_3(A)\) can only provide two linearly independent eigenvectors.

So we go back to the form matrix \(B\text{.}\) The third column of \(B\) gives us the condition

\begin{equation*} A \uvec{p}_3 = \ast \uvec{p}_1 + \ast \uvec{p}_2 + 3 \uvec{p}_3. \end{equation*}

As in Task 29.2.d.ii, in analogy with how we solve for eigenvectors we can rearrange this condition on \(\uvec{p}_3\) to

\begin{equation*} (3 I - A) \uvec{p}_3 = - (\ast \uvec{p}_1) - (\ast \uvec{p}_2) \text{.} \end{equation*}

We would much rather have a homogeneous system to solve instead of this nonhomogeneous one. But remember where we obtained \(\uvec{p}_1\) and \(\uvec{p}_2\) — they form a basis for the eigenspace \(E_3(A)\text{!}\) And the rearranged condition above says that \((3 I - A) \uvec{p}_3\) must be in the span of those two eigenvectors. In other words, while \(\uvec{p}_3\) cannot be taken to be an eigenvector, \((3 I - A) \uvec{p}_3\) must be an eigenvector. In other words, we must have

\begin{equation*} (3 I - A) \bigl( (3 I - A) \uvec{p}_3 \bigr) = \zerovec \text{,} \end{equation*}

a homogeneous condition we can solve by row reducing \((3 I - A)^2\text{.}\) In the new language of generalized eigenvectors, we need \(\uvec{p}_3\) to be in \(E_3^2(A)\text{,}\) the generalized eigensubspace of degree \(2\) for \(A\) corresponding to \(\lambda = 3\text{.}\)

Some care must be taken in choosing \(\uvec{p}_3\) though, as we need it to be linearly independent from \(\uvec{p}_1\) and \(\uvec{p}_2\text{.}\) As the eigenspace \(E_3(A)\) is nested inside the generalized eigensubspace \(E_3^2(A)\) (see Subsection 29.4.2), when solving \((3I - A)^2 \uvec{x} = \zerovec\) some of our null space basis vectors may end up being regular old eigenvectors instead of generalized eigenvectors. So we should make sure to construct the basis of this new null space as beginning with our two already chosen eigenvectors, and then extend that \(E_3(A)\) basis to a full basis for \(E_3^2(A)\text{.}\)

Now onto \(\uvec{p}_4\text{.}\) Just as we also took \(\uvec{p}_2\) to be an eigenvector, can we take \(\uvec{p}_4\) to be a generalized eigenvector of the second degree (i.e. from \(E_3^2(A)\))? When we solved for \(\uvec{p}_3\) in Task 29.2.d.iv, we found that the dimension of \(E_3^2(A)\) was \(4\text{,}\) so there was room for one more linearly independent vector beyond the three we already had (\(\uvec{p}_1,\uvec{p}_2,\uvec{p}_3\)). The condition from the fourth column of \(B\) is

\begin{equation*} A \uvec{p}_4 = \ast \uvec{p}_1 + \ast \uvec{p}_2 + \ast \uvec{p}_3 + 3 \uvec{p}_4 \text{.} \end{equation*}

Choosing \(\uvec{p}_4\) from \(E_3^2(A)\) just forces the \(\ast\) coefficient on the \(\uvec{p}_3\) term to be zero, allowing us to rearrange to

\begin{equation*} (3I - A) \uvec{p}_4 = - (\ast \uvec{p}_1) - (\ast \uvec{p}_2) \text{,} \end{equation*}

the same sort of condition that led us to take \(\uvec{p}_3\) from \(E_3^2(A)\text{.}\)

Finally, we turn to \(\uvec{p}_5\text{.}\) In this example, since the dimension of \(E_3^2(A)\) was only \(4\text{,}\) we will need to look elsewhere for our fifth vector. It is reasonable at this point to guess that we will need to take \(\uvec{p}_5\) from the null space of \((3 I - A)^3\text{,}\) i.e. from the next generalized eigensubspace \(E_3^3(A)\text{.}\) And that is exactly the thing we will need to do.

The fifth column of \(B\) gives us condition

\begin{equation*} A \uvec{p}_5 = \ast \uvec{p}_1 + \ast \uvec{p}_2 + \ast \uvec{p}_3 + \ast \uvec{p}_4 + 3 \uvec{p}_5 \text{.} \end{equation*}

If we rearrange to

\begin{equation*} (3 I - A) \uvec{p}_5 = - (\ast \uvec{p}_1) - (\ast \uvec{p}_2) - (\ast \uvec{p}_3) - (\ast \uvec{p}_4) \text{,} \end{equation*}

we see that \((3 I - A) \uvec{p}_5\) needs to be in the span of \(\{\uvec{p}_1,\uvec{p}_2,\uvec{p}_3,\uvec{p}_4\}\text{,}\) which is precisely \(E_3^2(A)\) in this example. That is, \((3 I - A) \uvec{p}_5\) must be in the null space of \((3 I - A)^2\text{,}\) which is the same as saying that \(\uvec{p}_5\) must be in the null space of \((3 I - A)^3\text{.}\)

Just as we need to be careful about how we chose \(\uvec{p}_3\) from \(E_3^2(A)\) to avoid dependence with \(E_3^1(A)\text{,}\) we also need to be careful about how we choose \(\uvec{p}_5\) from \(E_3^3(A)\) to avoid dependence with \(E_3^2(A)\text{.}\) So when we solve the homogeneous system \((3 I - A)^3 \uvec{x} = \zerovec\text{,}\) we should make sure to take our existing basis for the null space of \((3 I - A)^2\) as a starting point, and extend from there.

Subsection 29.4.4 Scalar-triangularization procedure

We'll now use the pattern of the example from Discovery 29.2 analyzed in the previous subsection to create a scalar-triangularization procedure.

Remark 29.4.2.
  1. Procedure 29.4.1 is actually a little more prescriptive than it needs to be. To ensure a scalar-triangular form, all that is really required is that the first column of \(P\) come from \(E_{\lambda_0}^1(A)\text{,}\) the second column come from \(E_{\lambda_0}^2(A)\text{,}\) and so on. (And still requiring linear independence of the final set of columns.) But since each generalized eigensubspace is contained in the next, taking two vectors from one generalized eigensubspace could be viewed as taking one from the current eigensubspace and one from the next. So in practice we might as well take as many new linearly independent generalized eigenvectors as possible from each new null space computation.
  2. Once again, it is not necessary to compute \(\inv{P}\) to determine the block-diagonal form matrix \(\inv{P} A P\text{.}\) One could use row reduction to compute \(\inv{P} A P\text{,}\) as in Subsection 26.4.2. But also one could go back to the pattern of similarity from Subsection 26.3.2.

    We know that the eigenvalue \(\lambda_0\) will appear repeated down the diagonal of the form matrix. So the \(\nth[j]\) column of \(\inv{P}AP\) will look like

    \begin{equation*} \begin{bmatrix} c_1 \\ \vdots \\ c_{j-1} \\ \lambda_0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \text{,} \end{equation*}

    where the \(c_i\) are unknown entries above the diagonal that we'd like to determine. From our general similarity pattern, we must have

    \begin{equation*} A\uvec{p}_j = c_1 \uvec{p}_1 + \dotsb + c_{j-1} \uvec{p}_{j-1} + \lambda_0 \uvec{p}_k \text{.} \end{equation*}

    Rearranging this a little differently than usual, we get

    \begin{equation*} (A - \lambda_0 I) \uvec{p}_j = c_1 \uvec{p}_1 + \dotsb + c_{j-1} \uvec{p}_{j-1} \text{.} \end{equation*}

    Therefore, the entries above the diagonal in the \(\nth[j]\) column of a scalar-triangular form matrix \(\inv{P}AP\) are precisely the coefficients needed to express \((A - \lambda_0 I) \uvec{p}_j\) as a linear combination of the previous columns \(\uvec{p}_1,\dotsc,\uvec{p}_{j-1}\). These coefficients can be determined by row reducing

    \begin{equation*} \left[\begin{array}{cccc|c} \uvec{p}_1 \amp \uvec{p}_2 \amp \cdots \amp \uvec{p}_{j-1} \amp (A - \lambda_0 I) \uvec{p}_j \end{array}\right] \text{.} \end{equation*}

    Though if you're going to do this much row reducting, it might easier to just reduce

    \begin{equation*} \left[\begin{array}{c|c} P \amp AP \end{array}\right] \quad\rowredarrow\quad \left[\begin{array}{c|c} I \amp \inv{P}AP \end{array}\right]\text{.} \end{equation*}