Skip to main content

Section 28.4 Concepts

Subsection 28.4.1 Operations with block matrices

Any matrix can be subdivided into submatrices or blocks. We usually indicate submatrices of a matrix by marking them off with horizontal and vertical lines. For example, the matrix on the left, subdivided as shown on the right,

\begin{align*} \amp \left[\begin{array}{rrrrr} 1 \amp 0 \amp 1 \amp 5 \amp 5 \\ 4 \amp -1 \amp 1 \amp 5 \amp 5 \\ 2 \amp 2 \amp 0 \amp 5 \amp 5 \\ 3 \amp 3 \amp 3 \amp 1 \amp 0 \\ 3 \amp 3 \amp 3 \amp -4 \amp 9 \end{array}\right] \amp \amp \longrightarrow \amp \left[\begin{array}{@{}crc|rc@{}} 1 \amp 0 \amp 1 \amp 5 \amp 5 \\ 4 \amp -1 \amp 1 \amp 5 \amp 5 \\ 2 \amp 2 \amp 0 \amp 5 \amp 5 \\ \hline 3 \amp 3 \amp 3 \amp 1 \amp 0 \\ 3 \amp 3 \amp 3 \amp -4 \amp 9 \end{array}\right] \end{align*}

has blocks

\begin{equation*} \left[\begin{array}{crc} 1 \amp 0 \amp 1 \\ 4 \amp -1 \amp 1 \\ 2 \amp 2 \amp 0 \end{array}\right] \text{,} \qquad \begin{bmatrix} 5 \amp 5 \\ 5 \amp 5 \\ 5 \amp 5 \end{bmatrix} \text{,} \qquad \begin{bmatrix} 3 \amp 3 \amp 3 \\ 3 \amp 3 \amp 3 \end{bmatrix} \text{,} \qquad \left[\begin{array}{rc} 1 \amp 0 \\ -4 \amp 9 \end{array}\right] \text{.} \end{equation*}

Notationally, we can write a subdivision into blocks as above as

\begin{equation*} \begin{bmatrix} A \amp B \\ C \amp D \end{bmatrix}\text{,} \end{equation*}

where now the letters inside the matrix represent blocks of entries instead of individual entries, and we drop the division lines. (Of course, a matrix can also be subdivided into more or fewer than four blocks.)

Adding and scalar multiplying matrices that have been subdivided into blocks works exactly as you'd expect:

\begin{align*} \begin{bmatrix} A_1 \amp B_1 \\ C_1 \amp D_1 \end{bmatrix} + \begin{bmatrix} A_2 \amp B_2 \\ C_2 \amp D_2 \end{bmatrix} \amp = \begin{bmatrix} A_1 + A_2 \amp B_1 + B_2 \\ C_1 + C_2 \amp D_1 + D_2 \end{bmatrix} \text{,} \amp k \begin{bmatrix} A \amp B \\ C \amp D \end{bmatrix} \amp = \begin{bmatrix} k A \amp k B \\ k C \amp k D \end{bmatrix} \text{,} \end{align*}

where addition by blocks requires not only that both matrices have the same size, but also that corresponding blocks have the same size.

Recall how multiplication of \(2 \times 2\) matrices is defined:

\begin{equation*} \begin{bmatrix} a_1 \amp a_2 \\ a_3 \amp a_4 \end{bmatrix} \begin{bmatrix} b_1 \amp b_2 \\ b_3 \amp b_4 \end{bmatrix} = \begin{bmatrix} a_1 b_1 + a_2 b_3 \amp a_1 b_2 + a_2 b_4 \\ a_3 b_1 + a_4 b_3 \amp a_3 b_2 + a_4 b_4 \end{bmatrix}\text{.} \end{equation*}

If \(A\) and \(B\) are square matrices that are broken up into four submatrices so that the block in the upper-left corner in each of \(A\) and \(B\) is also square and both of the same size, then we can actually use the same \(2 \times 2\) multiplication formula applied to the blocks of \(A\) and \(B\) to compute \(AB\text{.}\) That is, for

\begin{align*} A \amp= \begin{bmatrix} A_1 \amp A_2 \\ A_3 \amp A_4 \end{bmatrix} \text{,} \amp B \amp= \begin{bmatrix} B_1 \amp B_2 \\ B_3 \amp B_4 \end{bmatrix} \text{,} \end{align*}

where

  • \(A_1\) and \(B_1\) are \(m \times m\text{,}\)
  • \(A_2\) and \(B_2\) are \(m \times (n-m)\text{,}\)
  • \(A_3\) and \(B_3\) are \((n-m) \times m\text{,}\) and
  • \(A_4\) and \(B_4\) are \((n-m) \times (n-m)\text{,}\)

then we can compute the product \(AB\) as

\begin{align} \begin{bmatrix} A_1 \amp A_2 \\ A_3 \amp A_4 \end{bmatrix} \begin{bmatrix} B_1 \amp B_2 \\ B_3 \amp B_4 \end{bmatrix} = \begin{bmatrix} A_1 B_1 + A_2 B_3 \amp A_1 B_2 + A_2 B_4 \\ A_3 B_1 + A_4 B_3 \amp A_3 B_2 + A_4 B_4 \end{bmatrix}\text{.}\label{equation-block-diag-concepts-block-product}\tag{\(\star\)} \end{align}

We will carry out an example of using this formula in Subsection 28.5.1.

Warning 28.4.1.

Make sure you maintain the proper order of multiplication in the block multiplication formula above — order of matrix multiplication matters!

Recall that the product of two diagonal matrices of the same size can be computed by just multiplying corresponding diagonal entries together:

\begin{equation*} \begin{bmatrix} x_1 \\ \amp x_2 \\ \amp \amp \ddots \\ \amp \amp \amp x_n \end{bmatrix} \begin{bmatrix} y_1 \\ \amp y_2 \\ \amp \amp \ddots \\ \amp \amp \amp y_n \end{bmatrix} = \begin{bmatrix} x_1 y_1 \\ \amp x_2 y_2 \\ \amp \amp \ddots \\ \amp \amp \amp x_n y_n \end{bmatrix}\text{.} \end{equation*}

This pattern holds because of the zero entries in a diagonal matrix. As you found in Discovery 28.7, the zero submatrices in a block-diagonal matrix cause the same multiplication pattern to hold again, as long as corresponding blocks have the same size:

\begin{equation*} \begin{bmatrix} A_1 \\ \amp A_2 \\ \amp \amp \ddots \\ \amp \amp \amp A_n \end{bmatrix} \begin{bmatrix} B_1 \\ \amp B_2 \\ \amp \amp \ddots \\ \amp \amp \amp B_n \end{bmatrix} = \begin{bmatrix} A_1 B_1 \\ \amp A_2 B_2 \\ \amp \amp \ddots \\ \amp \amp \amp A_n B_n \end{bmatrix}\text{.} \end{equation*}

Similar patterns will hold for powers and inverses of block-diagonal matrices, and we will record these patterns as part of Proposition 28.6.1 in Subsection 28.6.1.

Subsection 28.4.2 Properties of block-diagonal matrices

One motiviation for studying similarity is to determine a “simplest” member of each similarity class, from which it is easy to obtain properties that are the same for every member of that class. So when we encounter a new matrix form, we should consider how the form makes it simpler to determine properties of matrices. And in the latter part of Discovery guide 28.2, you worked through some of those properties for block-diagonal form.

Again comparing to the diagonal case, we found in Discovery guide 28.2 that

  • for a diagonal matrix the determinant is the product of the diagonal entries, whereas for a block-diagonal matrix the determinant is the product of the determinants of the blocks on the diagonal,
  • for a diagonal matrix the characteristic polynomial is the product of the linear factors \(\lambda - d_j\text{,}\) where the \(d_j\) are the diagonal entries, whereas for a block-diagonal matrix the characteristic polynomial is the product of the characteristic polynomials of the blocks on the diagonal, and
  • for a diagonal matrix the eigenvalues are precisely the diagonal entries, whereas for a block-diagonal matrix the eigenvalues are those of the blocks on the diagonal.

It is worth pursuing the characteristic polynomial property of block-diagonal matrices a little further. An identity matrix (or any diagonal matrix, for that matter) can be split up into block-diagonal form any way you like. For example, if \(A\) is block-diagonal with an \(m_1 \times m_1\) block \(A_1\) in the upper left and an \(m_2 \times m_2\) block \(A_2\) in the lower left, then we can compute \(\lambda I - A\) as

\begin{equation*} \lambda I_n - A = \lambda \begin{bmatrix} I_{m_1} \\ \amp I_{m_2} \end{bmatrix} - \begin{bmatrix} A_1 \\ \amp A_2 \end{bmatrix} = \begin{bmatrix} \lambda I_{m_1} - A_1 \\ \amp \lambda I_{m_2} - A_2 \end{bmatrix}\text{,} \end{equation*}

where the subscripts on the identity matrices indicate their sizes. To compute the characteristic polynomial, we take a determinant:

\begin{align*} c_A(\lambda) = \det (\lambda I_n - A) \amp = \det \begin{bmatrix} \lambda I_{m_1} - A_1 \\ \amp \lambda I_{m_2} - A_2 \end{bmatrix}\\ \amp = \det (\lambda I_{m_1} - A_1) \det (\lambda I_{m_2} A_2)\\ \amp = c_{A_1}(\lambda) c_{A_2}(\lambda)\text{,} \end{align*}

using the pattern of determinants of block-diagonal matrices listed above. Since a root of \(c_A(\lambda)\) would have to be a root of one or the other (or both) of the block polynomials \(c_{A_1}(\lambda)\) and \(c_{A_2}(\lambda)\text{,}\) we see that the eigenvalues of \(A\) are precisely the eigenvalues of the blocks.

Subsection 28.4.3 Invariant subspaces

In our analysis of Discovery 28.1, a pattern that emerged was requiring a vector \(\uvec{p}\) from a specific subspace of \(\R^n\) to be transformed by \(n \times n\) matrix \(A\) back into the same subspace. This led to the concept of an \(A\)-invariant subspace.

As we will see in a future chapter, how a matrix transforms vectors in \(\R^n\) can be analyzed geometrically. In Discovery 28.2, we looked at geometric transformations of \(\R^3\) without tying them to matrices. For example, rotation around a line through the origin has two obvious invariant subspaces. First, vectors parallel to the axis of rotation stay fixed, and so stay within that line. Second, the plane through the origin and normal to the axis of rotation is also invariant, as vectors parallel to this plane will be rotated within the plane.

As is usually the case, a spanning set for a subspace tells all. In the case of invariant subspaces, we can determine whether a subspace is invariant by testing whether each of the spanning vectors are transformed by \(A\) to vectors that remain in the subspace. (See Proposition 28.6.2 in Subsection 28.6.2.)

For us, the most important example of an invariant subspace is an eigenspace of a matrix \(A\text{.}\) Since an eigenvector is transformed to a scalar multiple of itself, the transformed image remains in the eigenspace. This example will play a central role in our study of matrix forms.

Subsection 28.4.4 Independent subspaces

Subsubsection 28.4.4.1 Basic concept

When we form a transition matrix \(P\text{,}\) we require it to be invertible, so its columns must be linearly independent. In our analysis of Discovery 28.1, we found that each block in block-diagonal form corresponds to an invariant subspace, and we obtain independent columns for \(P\) from bases for the invariant subspaces involved. However, just taking independent vectors from different subspaces of \(\R^n\) and lumping them together does not guarantee that the whole collection of vectors will remain independent. This led to the concept of independent subspaces. This concept is a direct generalization of the concept of linearly independent set of vectors, as a independent set of vectors is the same as a collection of independent one-dimensional subspaces. (See Proposition 28.6.9 in Subsection 28.6.3.)

Just any old collection of independent subspaces won't do, however. A transition matrix for \(\R^n\) has \(n\) columns to fill, which leads to the concept of a complete set of independent subspaces.

Once again, the most important example of independent subspaces for us will be the different eigenspaces of a particular matrix (Corollary 25.6.7). With this point of view, since the algebraic multiplicities of the different eigenvalues of a matrix must add up to the degree of the characteristic polynomial (hence to the size of the matrix), Statement 1 of Corollary 25.6.10 could be recast as saying that a matrix is diagonalizable when its eigenspaces form a complete set of subspaces of \(\R^n\text{.}\)

Subsubsection 28.4.4.2 Independent subspaces in \(\R^3\)

It is useful to have a picture of independence of subspaces in \(\R^3\) to think about. We know that every (proper, nontrivial) subspace of \(\R^3\) is either a line through the origin or a plane through the origin, so we will consider combinations of those below.

Independence of lines.

Consider two nonparallel lines through the origin in \(\R^3\text{.}\) Every basis for one of the lines will consist of a single vector parallel to the line. If the two lines are nonparallel, so too will be the two basis vectors we have chosen, hence the two basis vectors are independent when lumped together. So a pair of nonparallel lines will form a pair of independent subspaces, but this will not be a complete set of independent subspaces since the combined dimensions of the two lines adds up to only \(2\text{.}\)

The same analysis will hold true for three lines through the origin in \(\R^3\text{,}\) as long as no two of them are parallel, and now this time three independent lines will form a complete set. But it will not work for four lines since we know that any collection of four vectors in \(\R^3\) must be dependent (Lemma 18.5.7).

Independence of a plane and a line.

If we have a plane and a line in \(\R^3\text{,}\) both through the origin, and with the line not lying parallel to the plane, then the two spaces will be independent. This is because every basis for the line will consist of a single vector pointing up out of the plane, and so will be independent from any pair of vectors that span the plane (Proposition 18.5.6). And this will be a complete set of independent spaces, as the dimensions of a plane and a line add up to \(3\text{.}\)

However, this won't work for a plane and two lines, again because four vectors in \(\R^3\) must be dependent.

Also, if the line lies parallel to the plane then we have dependence, because a basis vector for the line would also lie in the plane, and so could be paired up with another vector to create a basis for the plane (Proposition 20.5.4). This would create dependence between the chosen bases for the plane and for the line, since they share a vector.

Dependence of two planes.

Two planes in \(\R^3\) will be dependent, regardless of whether they are parallel or not, since choosing a basis for each plane would lead to a total of four vectors (two for each plane), and four vectors in \(\R^3\) must be dependent. Geometrically, two nonparallel planes in \(\R^3\) must intersect in a line, and it is this line that creates the dependence: we can build a basis for each plane that contains a vector parallel to the line of intersection, and then our collection of four vectors (again, two basis vectors from each plane) will be dependent because each plane has contributed a vector parallel to the line, and these two vectors must be scalar multiples of each other.

Warning 28.4.2.

A pair of two-dimensional planes in \(\R^4\) will be dependent if they intersect in a line, but there is “room” in \(\R^4\) for a pair of two-dimensional planes to be independent! And a pair of independent planes would comprise a complete set, as their dimensions would add up to \(4\text{.}\)

Subsection 28.4.5 The similarity pattern of block-diagonal form

In Discovery 28.1, we explored the pattern of similarity for block-diagonal form. That is, we explored what the general pattern of similarity discussed in Subsection 26.3.2 becomes when an arbitrary matrix \(A\) is assumed to be similar to a matrix \(B\) that is in block-diagonal form. Using the example matrix

\begin{equation*} B = \left[\begin{array}{rrrr} 1 \amp -1 \amp 0 \amp 0 \\ 3 \amp 7 \amp 0 \amp 0 \\ 0 \amp 0 \amp 4 \amp -2 \\ 0 \amp 0 \amp -2 \amp 1 \end{array}\right] \end{equation*}

from that discovery activity, if we regard the transition matrix \(P\) in a similarity relation \(\inv{P} A P = B\) as a collection of columns

\begin{equation*} P = \begin{bmatrix} | \amp | \amp | \amp | \\ \uvec{p}_1 \amp \uvec{p}_2 \amp \uvec{p}_3 \amp \uvec{p}_4 \\ | \amp | \amp | \amp | \end{bmatrix}\text{,} \end{equation*}

we determined that

\begin{align*} A \uvec{p}_1 \amp= 1 \uvec{p}_1 + 3 \uvec{p}_2 + 0 \uvec{p}_3 + 0 \uvec{p}_4 \amp \amp\implies \amp A\uvec{p}_1 \amp= \uvec{p}_1 + 3 \uvec{p}_2 \text{,}\\ A \uvec{p}_2 \amp= (-1)\uvec{p}_1 + 7 \uvec{p}_2 + 0 \uvec{p}_3 + 0 \uvec{p}_4 \amp \amp\implies \amp A\uvec{p}_2 \amp= - \uvec{p}_1 + 7 \uvec{p}_2,\\ A \uvec{p}_3 \amp= 0 \uvec{p}_1 + 0 \uvec{p}_2 + 4 \uvec{p}_3 + (-2) \uvec{p}_4 \amp \amp\implies \amp A\uvec{p}_3 \amp= 4 \uvec{p}_3 - 2 \uvec{p}_4.\\ A \uvec{p}_4 \amp= 0 \uvec{p}_1 + 0 \uvec{p}_2 + (-2) \uvec{p}_3 + 1 \uvec{p}_4 \amp \amp\implies \amp A\uvec{p}_3 \amp= -2 \uvec{p}_3 + \uvec{p}_4 \end{align*}

must all be true, where the coefficients in each linear combination on the left come from the corresponding column in the form matrix \(B\text{.}\)

What we notice from the simplified conditions on the right is that \(A\) transforms each of \(\uvec{p}_1\) and \(\uvec{p}_2\) into a linear combination of those two vectors, and similarly transforms each of \(\uvec{p}_3\) and \(\uvec{p}_4\) into a linear combination of those two vectors. In fact, we could make the same statement about linear combinations of \(\uvec{p}_1\) and \(\uvec{p}_2\text{,}\) and of \(\uvec{p}_3\) and \(\uvec{p}_4\text{.}\) In particular, for \(\uvec{w} = a \uvec{p}_1 + b \uvec{p}_2\text{,}\) we have

\begin{align*} A \uvec{w} \amp = A (a \uvec{p}_1 + b \uvec{p}_2) \\ \amp = a (A\uvec{p}_1) + b (A\uvec{p}_2) \\ \amp = a (\uvec{p}_1 + 3 \uvec{p}_2) + b (- \uvec{p}_1 + 7 \uvec{p}_2) \\ \amp = (a - b) \uvec{p}_1 + (3 a + 7) \uvec{p}_2 \text{,} \end{align*}

and we would find a similar pattern for how \(A\) transforms a linear combination of \(\uvec{p}_3\) and \(\uvec{p}_4\text{.}\) What we have uncovered are the following two patterns.

If we multiply any linear combination of \(\uvec{p}_1\) and \(\uvec{p}_2\) by \(A\text{,}\) the result will again be a linear combination of \(\uvec{p}_1\) and \(\uvec{p}_2\text{.}\)

If we multiply any linear combination of \(\uvec{p}_3\) and \(\uvec{p}_4\) by \(A\text{,}\) the result will again be a linear combination of \(\uvec{p}_3\) and \(\uvec{p}_4\text{.}\)

In other words, both subspaces \(W_1 = \Span \{ \uvec{p}_1, \uvec{p}_2 \}\) and \(W_2 = \Span \{ \uvec{p}_3, \uvec{p}_4 \}\) exhibit the following behaviour: if vector \(\uvec{w}\) in subspace \(W_j\) is transformed by \(A\text{,}\) then the result \(A \uvec{w}\) is again a vector in that subspace. This is precisely the condition that defines \(A\)-invariance of a subspace.

And this invariance condition tells us what to look for in a transition matrix \(P\) to put this particular matrix \(A\) into block-diagonal form. We would need to find a pair of two-dimensional, \(A\)-invariant subspaces of \(\R^4\text{.}\) If we had such a pair, we could determine a basis for each, giving us four vectors in total to fill in the columns of the \(4 \times 4\) matrix \(P\text{.}\) But there is one further wrinkle: we need all four columns of \(P\) to be linearly independent in order for \(P\) to be invertible, and obtaining them as basis vectors from different subspaces of \(\R^4\) only guarantees that the two pairs of vectors will be linearly independent separately. In other words, we need the subspaces \(W_1,W_2\) to be independent. Even more, we need them to be a complete set of independent subspaces, since we need their bases to total up to four vectors to fill the columns of the transition matrix \(P\text{.}\)

Subsection 28.4.6 Block-diagonalization procedure

We'll now use the pattern of the example from Discovery 28.1 analyzed in the previous subsection to create a block-diagonalization procedure.

Remark 28.4.4.
  1. For the moment, we do not specify how to find these invariant subspaces required by the procedure, or how many to look for, or whether we will even be able to find subspaces that fit all the criteria — we will tackle these questions in subsequent chapters. For the moment, Example 28.5.4, Proposition 28.6.3, and Theorem 28.6.10 will provide a clue of where we should start looking in those subsequent chapters. But Example 28.5.3 also provides an example of how geometry can be used to determine the invariant subspaces.
  2. As in the diagonalizable case, it is not necessary to compute \(\inv{P}\) to determine the block-diagonal form matrix \(\inv{P} A P\text{.}\) One could use the row reduction to compute \(\inv{P} A P\text{,}\) as in Subsection 26.4.2. But also one could go back to the pattern of similarity from Subsection 26.3.2:

    • the first column of the first block will be the coefficients that appear when \(A\uvec{p}_1\) is expressed as a linear combination of \(\uvec{p}_1,\uvec{p}_2,\dotsc,\uvec{p}_{d_1}\) (the chosen basis vectors for \(W_1\));
    • the second column of the first block will be the coefficients that appear when \(A\uvec{p}_2\) is expressed as a linear combination of \(\uvec{p}_1,\uvec{p}_2,\dotsc,\uvec{p}_{d_1}\text{;}\)
    • and so on for the remaining columns of the first block;
    • and similarly for the other blocks, where entries in the \(\nth[j]\) block are determined using linear combinations of the basis vectors for invariant subspace \(W_j\text{.}\)