Skip to main content

Section 6.5 Theory

As mentioned, elementary matrices are precisely the connection we need between systems of equations and row operations on one hand and matrix multiplication and inverses on the other.

Subsection 6.5.1 Inverses of elementary matrices

Let's first record an important property of elementary matrices we encountered in Section 6.3.

Subsection 6.5.2 Inverses versus row operations

Now let's connect inverses to row reduction via elementary matrices.

We will show that each statement of the theorem implies the next.

Whenever Statement 1 is true, then so is Statement 2.

We have already seen in Proposition 5.5.10 that when \(A\) is invertible, then also every system with \(A\) as coefficient matrix will have one unique solution.

Whenever Statement 2 is true, then so is Statement 3.

If Statement 2 is true about \(A\text{,}\) then every system \(A\uvec{x}=\uvec{b}\) with \(A\) as coefficient matrix has one unique solution. In particular, the homogeneous system \(A\uvec{x}=\zerovec\) (i.e. in the case that \(\uvec{b}=\zerovec\)) has one unique solution. But we know that a homogeneous system always has the trivial solution \(\uvec{x}=\zerovec\text{,}\) so that must be the one unique solution.

Whenever Statement 3 is true, then so is Statement 4.

We need to verify that there is at least one example of a system \(A\uvec{x}=\uvec{b}\) that has one unique solution. But we are already assuming that the homogeneous system \(A\uvec{x}=\zerovec\) has one unique solution, so the required example is provided by taking \(\uvec{b} = \zerovec\text{.}\)

Whenever Statement 4 is true, then so is Statement 5.

Suppose that Statement 4 is true, so that there is at least one example of a system \(A\uvec{x}=\uvec{b}\) that has one unique solution. Imagine trying to solve this system by row reducing the associated augmented matrix:

\begin{equation*} \left[\begin{array}{c|c} A \amp \uvec{b} \end{array}\right] \rowredarrow \left[\begin{array}{c|c} \RREF(A) \amp \uvec{b}' \end{array}\right], \end{equation*}

where \(\uvec{b}'\) is whatever appears in the “equals” column after all of our row operations. When we have arrived at the RREF of \(A\) in the coefficient matrix part on the left, and are ready to solve the simplified solution, there should not be be any free variables. Because free variables would lead to parameters, and hence infinite solutions, whereas we are assuming that this particular system has only one unique solution. So every column in the RREF of \(A\) must have a leading one. By definition, the rank of \(A\) is equal to the number of leading ones in its RREF, and so for this \(A\) the rank is equal to the number of columns. But \(A\) is square, so the number of columns is the same as the size of \(A\text{.}\)

Whenever Statement 5 is true, then so is Statement 6.

If \(A\) is square, so is its RREF, and both matrices have the same size. And if the rank of \(A\) is equal to its size, then every column in the RREF of \(A\) must have a leading one, and these leading ones must march down the diagonal of \(A\text{.}\) In a RREF matrix, a column that contains a leading one must have every other entry equal to zero. Thus, the RREF of \(A\) must be the identity matrix.

Whenever Statement 6 is true, then so is Statement 7.

Suppose that Statement 6 is true, so that \(A\) can be reduced to the identity. That is, \(A\) can be reduced to \(I\) by some sequence of elementary row operations. Each of these operations has a corresponding elementary matrix, so there is some collection of elementary matrices \(E_1,E_2,\dotsc,E_{\ell-1},E_\ell\) so that

\begin{gather} E_\ell E_{\ell-1} \dotsm E_2 E_1 A = I.\label{equation-elem-matrices-theory-elem-row-reduce-to-I-eqn}\tag{\(\star\)} \end{gather}

Now, by Lemma 6.5.1, each of \(E_1,E_2,\dotsc,E_{\ell-1},E_\ell\) is invertible. Therefore, so is the product

\begin{equation*} E_\ell E_{\ell-1} \dotsm E_2 E_1, \end{equation*}

with inverse

\begin{equation*} \inv{E}_1\inv{E}_2\dotsm \inv{E}_{\ell-1}\inv{E}_\ell \end{equation*}

(Rule 4 of Proposition 5.5.5).

Using this inverse, we can isolate \(A\) in (\(\star\)) above:

\begin{align*} E_\ell \dotsm E_2 E_1 A \amp = I\\ \inv{(E_\ell \dotsm E_2 E_1)} (E_\ell \dotsm E_2 E_1) A \amp = \inv{(E_\ell \dotsm E_2 E_1)} I \phantom{\inv{(E_\ell \dotsm E_2 E_1)}}\\ I A \amp = \inv{(E_\ell \dotsm E_2 E_1)}\\ A \amp = \inv{E}_1\inv{E}_2\dotsm \inv{E}_\ell. \end{align*}

So, we have \(A\) expressed as a product of the inverses of a collection of elementary matrices. But by Lemma 6.5.1, each of these inverses is actually an elementary matrix as well, and so we really have \(A\) expressed as a product of a collection of elementary matrices, as desired.

Whenever Statement 7 is true, then so is Statement 1.

If \(A\) is equal to a product of elementary matrices, then since each of those elementary matrices is invertible (Lemma 6.5.1), their product (and hence \(A\)) is also invertible (Rule 4 of Proposition 5.5.5).

Conclusion.

We now have a circle of logical deductions. Starting with the knowledge that any one of the seven statements is true for a particular matrix \(A\text{,}\) we can deduce from the logic above that the next statement is true for \(A\text{,}\) and then from that, that the next statement is true for \(A\text{,}\) and so on. When we get to the last statement, the logic above then requires that the first statement will also be true for \(A\text{,}\) and we can continue from there on to the second statement, and so on, until we are sure that all statements are true for \(A\text{.}\) Therefore, the seven statements are equivalent.

Remark 6.5.3.
  • In this theorem, the claim that these seven statements are equivalent for a particular matrix \(A\) means that if we know that any one of the statements is true for \(A\text{,}\) then it must be that all seven statements are true for \(A\text{.}\) For example, if we had a square matrix that we were able to row reduce to the identity, then the theorem tells us that that matrix must be invertible, that every linear system with that matrix as coefficient matrix has one unique solution, and so on. On the other hand, if we know that any one of the statements is false for a particular matrix \(A\text{,}\) then it must be that all seven statements are false for \(A\text{.}\) As soon as one statement is known to be false for a particular square matrix, it becomes impossible for any of the other statements to be true for that matrix, since knowing that this other statement is true implies that all seven statements are true for it, including the original statement that we already knew was false. And a statement cannot be both true and false for a particular matrix \(A\text{.}\)
  • It may seem unneccessary or even redundant to have all three of Statements 2–4 included in the list in Theorem 6.5.2, but these statements are definitely not the same. The equivalence of Statement 1 and Statement 2 tells us that when a matrix is invertible, then every system corresponding to that coefficient matrix has one unique solution, and vice versa. But the reverse connection would be difficult to use in practice: would you want to check that every system with a particular square coefficient matrix has one unique solution in order to conclude that the matrix is invertible? There are infinity of possible systems to check! The equivalence of Statement 4 and Statement 1 makes the reverse logic easier in practice: if you have just one example of a linear system with a square coefficient matrix that has one unique solution, then you can conclude that the matrix is invertible. Even better, the equivalence of Statement 3 and Statement 1 tells you that you can just check the corresponding homogeneous system as your one example of a system with that particular coefficient matrix that has only one unique solution. Furthermore, the equivalence of Statement 2 and Statement 4 tells you that once you know one example of a system with that particular coefficient matrix that has only one unique solution, then you can conclude without checking that every system with that coefficient matrix has only one unique solution.
  • In the proof of Theorem 6.5.2, the most important link is the one between Statement 6 and Statement 7, as this equivalence provides the link between row reducing and elementary matrices. In practice, the link between Statement 7 and Statement 1 is also important, as it helps us to compute the inverse of a matrix. But in further developing matrix theory, the most important link is the one between Statement 1 and Statement 3, as it will allow us to obtain further general properties of inverses. In particular, these statements will figure into the proofs of the propositions in the next subsection.

Subsection 6.5.3 More properties of inverses

Using our new connections between inverses and row operations, we can expand our knowledge about inverses in general.

We are assuming that we have square matrices \(A\) and \(B\) so that \(B A = I\text{.}\) We would first like to check that \(A\) is invertible. By Theorem 6.5.2, we can instead check that the homogeneous system \(A \uvec{x} = \zerovec\) has only the trivial solution. So suppose that \(\uvec{x}_0\) is a solution to this system, so that \(A \uvec{x}_0 = \zerovec\text{.}\) But then we can carry out two different simplifications of \(B A \uvec{x}_0\text{,}\) one using the assumption \(B A = I\) and one using the assumption \(A \uvec{x}_0 = \zerovec\text{:}\)

\begin{align*} B A \uvec{x}_0 \amp= (B A) \uvec{x}_0 \amp B A \uvec{x}_0 \amp= B (A \uvec{x}_0)\\ \amp= I\uvec{x}_0 \amp \amp= B \zerovec\\ \amp= \uvec{x}_0, \amp \amp= \zerovec. \end{align*}

Since both simplifications are correct, we have \(\uvec{x}_0 = \zerovec\text{.}\) So what we have discovered is that because there exists a matrix \(B\) so that \(B A = I\text{,}\) then whenever we think we have a solution \(\uvec{x}_0\) to the system \(A \uvec{x} = \zerovec\text{,}\) that solution turns out to be the trivial solution. Thus, \(A \uvec{x} = \zerovec\) must have only the trivial solution, and hence \(A\) is invertible (Theorem 6.5.2).

Now that we know that \(A\) is invertible, we can use its inverse to manipulate the equality \(B A = I\text{:}\)

\begin{align*} B A \amp= I\\ (B A) \inv{A} \amp= I \inv{A}\\ B (A \inv{A}) \amp= \inv{A}\\ B I \amp= \inv{A}\\ B \amp= \inv{A}. \end{align*}

So, we have that \(A\) is invertible and \(\inv{A} = B\text{,}\) as desired.

Remark 6.5.5.

Recall that by definition, to verify that a matrix \(B\) is the inverse of a matrix \(A\text{,}\) we would need to check that both \(BA=I\) and \(AB=I\) are true. We needed both orders of multiplication in the definition of inverse matrix because order of matrix multiplication matters, and we couldn't be sure that both \(BA\) and \(AB\) would produce the same result. Via the theory of elementary matrices, we now have the above proposition that allows us to check an inverse by only checking one order of multiplication: \(BA=I\text{.}\)

There is nothing special about \(BA=I\) versus \(AB=I\text{.}\) The previous and following propositions combine to tell us we only need to verify only one of \(BA=I\) or \(AB=I\) to check that \(B\) is the inverse of \(A\text{.}\)

Here, we are assuming that we have square matrices \(A\) and \(B\) so that \(AB=I\text{,}\) and we again would like to know that \(A\) is invertible and that \(B=\inv{A}\text{.}\) However, instead of appealing back to Theorem 6.5.2, we can use Proposition 6.5.4 with the roles of \(A\) and \(B\) reversed: since \(AB=I\text{,}\) Proposition 6.5.4 says that \(B\) must be invertible and that \(A = \inv{B}\text{.}\) But inverses are themselves invertible (Rule 1 of Proposition 5.5.5), so \(A\) is invertible with

\begin{equation*} \inv{A} = \inv{(\inv{B})} = B, \end{equation*}

as desired.

In Proposition 5.5.5, we learned that products and powers of invertible matrices are always invertible. It turns out that a product of matrices can only be invertible if the matrices making up the product are all invertible, and a power of a matrix can only be invertible if the base matrix is invertible.

Suppose that \(M N\) is invertible. Then it has an inverse; let's call it \(X\) instead of \(\inv{(M N)}\text{.}\) By definition, this means that

\begin{equation*} X (M N) = I \text{.} \end{equation*}

Using Rule 1.e of Item 1, we may rewrite

\begin{equation*} (X M) N = I \text{.} \end{equation*}

Applying Proposition 6.5.4 with \(B = X M\) and \(A = N\text{,}\) we may conclude that \(N\) is invertible with inverse

\begin{equation*} \inv{N} = X M \text{.} \end{equation*}

Similarly, since \(X\) is the inverse of \(M N\text{,}\) we may write

\begin{equation*} (M N) X = I \end{equation*}

and rewrite

\begin{equation*} M (N X) = I \text{.} \end{equation*}

Applying Proposition 6.5.6 this time, we may conclude that \(M\) is invertible with inverse

\begin{equation*} \inv{M} = N X \text{.} \end{equation*}

We leave the proof of this statement to you, the reader.

This is the special case of Statement 2 where each of \(M_1,M_2,\dotsc,M_{\ell-1},M_\ell\) is equal to \(M\text{.}\)

As in Proposition 5.5.7, we can turn the statements of Proposition 6.5.7 around to create new facts about singular (i.e. non-invertible) matrices. Note that the statements below are new statements about singular matrices, related but not equivalent to the statements in Proposition 5.5.7.

If the product \(MN\) were invertible, then Statement 1 of Proposition 6.5.7 says that each of \(M\) and \(N\) would have to be invertible. But we are assuming that at least one of them is not, so it is not possible for the product \(MN\) to be invertible.

This proof again is similar to that above for Statement 1, relying on Statement 3 of Proposition 6.5.7 instead. Alternatively, one could view this as the special case of Statement 2 of the current proposition, where each factor \(M_i\) is taken to be equal to \(M\text{.}\)

Finally, we can use the link between Statement 1 and Statement 6 of Theorem 6.5.2 to make Proposition 5.5.4 more precise.

We explored this in Discovery 6.6.

Start with the matrix \(A = \left[\begin{smallmatrix} a \amp b \\ c \amp d\end{smallmatrix}\right] \) and row reduce to see whether it is possible to get to the identity. But in the operations we choose, we need to be careful not to divide by zero, because the variable entries could be any values, including some zero. So it will be necessary to break into cases, such as \(a=0\) versus \(a\neq 0\text{,}\) and the row reduction steps chosen will differ in the different cases. Ultimately, it will be possible to get the identity as the RREF of \(A\) precisely when \(ad-bc\neq 0\text{,}\) and it will be impossible when \(ad-bc=0\text{.}\) From here, we may appeal to the equivalence of Statement 1 and Statement 6 of Theorem 6.5.2.

Subsection 6.5.4 Solution sets of row equivalent matrices

Elementary matrices also give us the tool we need to prove that row equivalent matrices represent systems with the same solution set. We first recorded the following as Theorem 2.5.5 in Subsection 2.5.2, but did not prove it there. We repeat the theorem here, and include a proof.

Consider systems \(A_1\uvec{x} = \uvec{b}_1\) and \(A_2\uvec{x} = \uvec{b}_2\text{,}\) where augmented matrices

\begin{align*} A_1' \amp= \left[\begin{array}{c|c} A_1 \amp \uvec{b}_1 \end{array}\right], \amp A_2' \amp= \left[\begin{array}{c|c} A_2 \amp \uvec{b}_2 \end{array}\right] \end{align*}

are row equivalent. Then there exists a sequence of elementary row operations that can be applied to \(A_1'\) to produce \(A_2'\text{.}\) If we set \(E\) to be the product of all the elementary matrices corresponding to the operations in this sequence, then we have \(A_2' = EA_1'\text{.}\) Because of the way matrix multiplication acts on columns, we then have

\begin{equation*} \left[\begin{array}{c|c} A_2 \amp \uvec{b}_2 \end{array}\right] = E \left[\begin{array}{c|c} A_1 \amp \uvec{b}_1 \end{array}\right] = \left[\begin{array}{c|c} EA_1 \amp E\uvec{b}_1 \end{array}\right], \end{equation*}

and so we also have

\begin{align*} A_2 \amp= EA_1, \amp \uvec{b}_2 \amp= E\uvec{b}_1. \end{align*}

Furthermore, we know that every elementary matrix is invertible (Lemma 6.5.1), and that products of invertible matrices are invertible (Statement 4 of Proposition 5.5.5), so we conclude that \(E\) is invertible. Therefore, we also have

\begin{align*} A_1 \amp= \inv{E}A_2, \amp \uvec{b}_1 \amp= \inv{E}\uvec{b}_2. \end{align*}

We are now in a position to verify that a solution to one system is also a solution to the other system.

A solution to system \(A_1\uvec{x} = \uvec{b}_1 \) is also a solution to \(A_2\uvec{x} = \uvec{b}_2 \).

Suppose \(\uvec{x} = \uvec{x}_1\) solves system \(A_1\uvec{x} = \uvec{b}_1\text{,}\) so that \(A_1\uvec{x}_1 = \uvec{b}_1\) is true. Then,

\begin{equation*} A_2\uvec{x}_1 = (EA_1)\uvec{x}_1 = E(A_1\uvec{x}_1) = E\uvec{b}_1 = \uvec{b}_2. \end{equation*}

Thus, \(\uvec{x} = \uvec{x}_1\) is also a solution to system \(A_2\uvec{x} = \uvec{b}_2\text{.}\)

A solution to system \(A_2\uvec{x} = \uvec{b}_2\) is also a solution to \(A_1\uvec{x} = \uvec{b}_1\).

Suppose \(\uvec{x} = \uvec{x}_2\) solves system \(A_2\uvec{x} = \uvec{b}_2\text{,}\) so that \(A_2\uvec{x}_2 = \uvec{b}_2\) is true. Then,

\begin{equation*} A_1\uvec{x}_2 = (\inv{E}A_2)\uvec{x}_2 = \inv{E}(A_2\uvec{x}_2) = \inv{E}\uvec{b}_2 = \uvec{b}_1. \end{equation*}

Thus, \(\uvec{x} = \uvec{x}_2\) is also a solution to system \(A_1\uvec{x} = \uvec{b}_1\text{.}\)

Conclusion.

Since we have now shown that every solution of one system is a solution to the other system, both systems must have exactly the same solution set.