Section 4.5 Theory
Subsection 4.5.1 Rules of matrix algebra
When we want to work algebraically with letters that represent matrices, most of the familiar rules from the algebra of numbers still hold. We collect many of these rules of matrix algebra in the list below. We will not prove that all of these rules are valid, but we will verify some of the rules to demonstrate the general pattern of their proofs. For some of the proofs we will be more rigorous than others, but in all of the proofs we want to verify that the matrix on the left-hand side is equal to the one on the right-hand side.
Proposition 4.5.1. Matrix algebra.
The following are valid rules of matrix algebra. In each statement, assume that \(A,B,C\) are arbitrary matrices and \(\zerovec\) is a zero matrix, all of appropriate sizes so that the matrix operations can be carried out. In particular, in any rule involving a matrix power, the matrices involved are assumed to be square. Also assume that \(k\) and \(m\) are scalars, and that \(p\) and \(q\) are positive integers.
-
Basic rules of addition and multiplication.
- \(\displaystyle B+A = A+B\)
- \(\displaystyle A + (B + C) = (A + B) + C\)
- \(\displaystyle A(B+C) = AB + AC\)
- \(\displaystyle (A+B)C = AC+BC\)
- \(\displaystyle A(BC) = (AB)C\)
-
Rules involving scalar multiplication.
- \(\displaystyle k(A+B) = kA + kB\)
- \(\displaystyle (k+m)A = kA + mA\)
- \(\displaystyle (kA)B = k(AB)\)
- \(\displaystyle A(kB) = k(AB)\)
- \(\displaystyle k(mA) = (km)A\)
- \(\displaystyle A - B = A + (-1)B\)
-
Rules involving a zero matrix.
- \(\displaystyle A + \zerovec = A\)
- \(\displaystyle A-A = \zerovec\)
- \(\displaystyle A\zerovec = \zerovec\)
- \(\displaystyle \zerovec B = \zerovec\)
- \(\displaystyle k\zerovec = \zerovec\)
-
Rules involving matrix powers.
- \(\displaystyle A^p A^q = A^{p+q}\)
- \(\displaystyle (A^p)^q = A^{pq}\)
- \(\displaystyle (kA)^p = k^p A^p\)
- \(\displaystyle \zerovec^p = \zerovec\)
-
Rules involving the transpose.
- \(\displaystyle \utrans{(\utrans{A})} = A\)
- \(\displaystyle \utrans{(A+B)} = \utrans{A} + \utrans{B}\)
- \(\displaystyle \utrans{(kA)} = k\utrans{A}\)
- \(\displaystyle \utrans{(AB)} = \utrans{B}\utrans{A}\)
- \(\displaystyle \utrans{(A^p)} = (\utrans{A})^p\)
- \(\displaystyle \utrans{\zerovec} = \zerovec\)
Proof of Rule 1.b.
First, it’s important to remember what equality of matrices means, so that we know what we should be verifying: two matrices are equal when they have the same size and the same entries. And to be sure, while the formulas on the left- and right-hand sides of the rule under consideration each involve three matrices, the formulas themselves each represent a single matrix.
Let’s also make sure we understand the difference between the left- and right-hand sides. On the left, the brackets tell us that we should add \(B\) and \(C\) first, and then add \(A\) to that result. The brackets on the right tell us that we should add \(A\) and \(B\) first, and then add \(C\) to that result. Next, let’s compare sizes. To be able to add \(A,B,C\text{,}\) they must be all the same size, and then the result of adding them (in any combination) will also be that common size. So the left- and right-hand results will be the same size of matrix. Finally, let’s make sure each entry in the left-hand result is the same as the corresponding entry in the right-hand result. Since we don’t actually know what the entries are or even how many entries there are, we cannot verify this entry by entry. So we work in general instead: consider what the \(\nth[(i,j)]\) entry of each side must be, where \(i,j\) is a pair of row and column indices, in terms of the entries of \(A,B,C\text{.}\) For this, you might want to review the conventions on referring to matrix entries described in Subsection 4.3.1. On the left, we have
\begin{equation*}
[B+C]_{ij} = b_{ij} + c_{ij},
\end{equation*}
and so
\begin{equation*}
\bigl[A+(B+C)\bigr]_{ij} = [A]_{ij} + [B+C]_{ij} = a_{ij} + (b_{ij} + c_{ij}).
\end{equation*}
A similar process on the right gives us
\begin{equation*}
\bigl[(A+B)+C\bigr]_{ij} = [A+B]_{ij} + [C]_{ij} = (a_{ij} + b_{ij}) + c_{ij}.
\end{equation*}
Since we know from high-school algebra that addition of ordinary numbers satisfies the associativity rule
\begin{equation*}
a+(b+c)=(a+b)+c\text{,}
\end{equation*}
we can see that the \(\nth[(i,j)]\) entries of the matrices represented by the formulas on the left- and right-hand sides of this rule will always match.
Proof of Rule 2.c.
First, since scalar multiplication does not change the size of a matrix, if \(A\) and \(B\) are compatible sizes for multiplication, then so are \(kA\) and \(B\text{,}\) and the sizes of \((kA)B\) and \(k(AB)\) will be the same. Next, consider the \(\nth[(i,j)]\) entry of each side. Write \(\uvec{a}_i\) for the \(\nth[i]\) row of \(A\) and \(\uvec{b}_j\) for the \(\nth[j]\) column of \(B\text{.}\) Using the row-times-column pattern of matrix multiplication, and noticing that the \(\nth[i]\) row of \(kA\) is just \(k\uvec{a}_i\text{,}\) we have
\begin{align*}
[\text{LHS}]_{ij} \amp= \bigl[(kA)B\bigr]_{ij} = (k\uvec{a}_i)\uvec{b}_j, \amp
[\text{RHS}]_{ij} \amp= \bigl[k(AB)\bigr]_{ij} = k(\uvec{a}_i\uvec{b}_j).
\end{align*}
So these two entries will be equal if the rule
\begin{equation*}
(k\uvec{a})\uvec{b} = k(\uvec{a}\uvec{b})
\end{equation*}
is always true for \(1\times n\) row vector \(\uvec{a}\) and \(n\times 1\) column vector \(\uvec{b}\text{.}\) In this new rule, both sides are size \(1\times 1\text{,}\) and indeed we have
\begin{align*}
\text{New LHS} \amp= (k\uvec{a})\uvec{b}\\
\amp= \left(k\begin{bmatrix}a_1 \amp a_2 \amp \cdots \amp a_n\end{bmatrix}\right)
\begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix}\\
\amp= \begin{bmatrix}ka_1 \amp ka_2 \amp \cdots \amp ka_n\end{bmatrix}
\begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix}\\
\amp= \begin{bmatrix}(ka_1) b_1 + (ka_2) b_2 + \dotsb + (ka_n) b_n\end{bmatrix},\\
\end{align*}
and
\begin{align*} \text{New RHS} \amp= k(\uvec{a}\uvec{b})\\ \amp= k\left( \begin{bmatrix}a_1 \amp a_2 \amp \cdots \amp a_n\end{bmatrix} \begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix} \right)\\ \amp= k\begin{bmatrix}a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n\end{bmatrix}\\ \amp= \begin{bmatrix}k (a_1 b_1) + k(a_2 b_2) + \dotsb + k(a_n b_n)\end{bmatrix}. \end{align*}We can now clearly see that the two sides will be equal using the associativity rule \((ka)b = k(ab)\) for numbers from high-school algebra.
Proof of Rule 4.a.
We can prove this rule in the same manner as the corresponding rule for powers of numbers from high-school algebra, without worrying about individual entries of the matrices on either side of the equality. On the left we are separately multiplying together \(p\) factors of \(A\) and \(q\) factors of \(A\text{,}\) and then multiplying those two results together. Rule 1.e says that we can multiply all of these factors of \(A\) together in any combinations and get the same result. Since there are \(p + q\) factors of \(A\) all together, the result will be the same as \(A^{p + q}\text{.}\)
Remark 4.5.2.
- Algebra rules are not handed down from on high, they represent patterns where two different sequences of computations always produce the same result. For example, we can see that\begin{equation*} 2 (3 + 5) = 2 \cdot 3 + 2 \cdot 5, \end{equation*}not from algebra but from computation:\begin{align*} \text{LHS} \amp= 2 (3 + 5) = 2 (8) = 16, \amp \text{RHS} \amp= 2 \cdot 3 + 2 \cdot 5 = 6 + 10 = 16. \end{align*}This example of different computations yielding the same result did not depend on the numbers \(2,3,5\) but on the pattern of the sequences of computations, and we capture this pattern algebraically in terms of letters as the distributive rule \(a (b + c) = a b + a c\text{.}\) The algebra rules above capture similar universal patterns of different sequences of matrix operations that always produce the same result.
- In the rules, the letters \(A,B,C\) are placeholders for any arbitrary matrices. When we use these rules, we might need to apply them where a whole formula of letters that computes to a single matrix takes the place of one of \(A,B,C\text{.}\) For example, see the first step of the FOIL example below.
- In the preamble to the proposition, we stated that most of the familiar rules from the algebra of numbers still hold for matrices. But there is one important rule that does not hold! Remember that order of matrix multiplication matters: \(AB\) and \(BA\) are not equal in general.
- As you read the rules, think about the point of the rule. For example, consider Rule 1.b. Matrix addition is defined as an operation between two matrices. If we write something like \(A + B + C\text{,}\) it is ambiguous what is meant. Does it mean that \(A + B\) should be performed first, and then \(C\) added to that result? Or should \(B + C\) be performed first, and then \(A\) added to that result? Mathematical notation is about communication of mathematical ideas, patterns, and computations. Ambiguity in communication is bad. To resolve the ambiguity in writing \(A + B + C\text{,}\) we would require brackets to communicate which order of successive additions is meant. But the point of Rule 1.b is that it doesn’t matter — either meaning will yield the same end result. Rule 1.e establishes a similar pattern for matrix multiplication.
- Also, as you read the rules, try to think of the pattern each one is expressing in words. For example, for Rule 3.a, reading out “A plus zero equals A” is a lot less clear than interpreting the rule as “adding the zero matrix to any matrix has no effect.”
- Rule 1.c and Rule 1.d are not redundant because order of matrix multiplication matters. In particular, it’s important to be careful when using these rules to factor a common multiple out of a sum. For example, \(A X + B X\) cannot be factored as \(X (A + B)\text{,}\) because then \(X\) is multiplying on the left when originally it was multiplying both \(A\) and \(B\) on the right. The correct factorization is \(A X + B X = (A + B) X\text{.}\) Even worse, \(A X + X B\) cannot be factored at all.
- There are two things to note about the rules involving the transpose. First, in Rule 5.f, the zero matrices on either side of the equality are not necessarily of the same size (unless they are both square). Second, notice how a transpose of a product reverses the order of multiplication in Rule 5.d. This happens because in the product \(A B\) we are multiplying rows of \(A\) against columns of \(B\text{.}\) If we were to compute the product of transposes \(\utrans{A} \utrans{B}\text{,}\) we would be multiplying rows of \(\utrans{A}\) (i.e. columns of \(A\)) against columns of \(\utrans{B}\) (i.e. rows of \(B\)). Obviously these two computations won’t compare, and we need to reverse the order to \(\utrans{B}\utrans{A}\) so that rows of \(\utrans{B}\) (i.e. columns of \(B\)) multiply against columns of \(\utrans{A}\) (i.e. rows of \(A\)), similarly to \(A B\text{.}\)
Example 4.5.3. Using the rules.
Here is an example of using some of the basic rules to justify a slightly more involved rule like FOIL. Assume \(A,B,Y,Z\) are square matrices of the same size.
\begin{align*}
(A+B)(Y+Z)
\amp= A(Y+Z) + B(Y+Z) \amp
\amp\text{(i)}\\
\amp= (AY + AZ) + (BY + BZ) \amp
\amp\text{(ii)}\\
\amp= AY + AZ + BY + BZ \amp
\amp\text{(iii)}
\end{align*}
Here are the justifications for the numbered steps, using the algebra rules in Proposition 4.5.1.
Subsection 4.5.2 Linear systems as matrix equations
In the examples expressing system solutions in terms of column vectors in Subsection 4.4.4, we noticed a pattern: the general solution of a consistent system can always be expressed as a constant part plus a variable part, where the constant part is a particular solution to the system (corresponding to setting all parameters to \(0\)) and the variable part involves the parameters. This is true even for
- a system with one unique solution (as in Example 4.4.8), in which case we consider the variable part to be zero; and
- a homogeneous system (as in Example 4.4.10), in which case we consider the constant part to be zero (i.e. the trivial solution).
We further saw that the pattern goes a bit deeper when we explored the pattern between the solutions to Example 4.4.9 and Example 4.4.11. These two systems had the same coefficient matrix, but one was nonhomogeneous and the other was homogeneous. We saw that the two solutions have exactly the same variable part. This pattern will always emerge for a consistent system.
Lemma 4.5.4. Homogeneous/nonhomogeneous solution set patterns.
If \(\uvec{x}_1\) is a particular solution to system \(A\uvec{x} = \uvec{b}\text{,}\) then every other solution to this system can be expressed as the sum of \(\uvec{x}_1\) and some solution to the corresponding homogeneous system \(A\uvec{x} = \zerovec\text{.}\)
Proof.
We have solution \(\uvec{x}_1\) to system \(A\uvec{x}=\uvec{b}\text{.}\) By definition, this means that the matrix equation defining the system is valid when we substitute \(\uvec{x}=\uvec{x}_1\text{.}\) That is, we know that \(A\uvec{x}_1 = \uvec{b}\text{.}\) Suppose we have another solution \(\uvec{x}_2\text{.}\) Again, this means that \(A\uvec{x}_2=\uvec{b}\) is also true. We would like to show that \(\uvec{x}_2\) is equal to the sum of \(\uvec{x}_1\) and some solution to the homogeneous system \(A\uvec{x}=\zerovec\text{.}\) Set \(\uvec{x}_0 = \uvec{x}_2 - \uvec{x}_1\text{.}\) We claim that \(\uvec{x} = \uvec{x}_0\) is a solution to \(A\uvec{x}=\zerovec\text{.}\) Let’s verify:
\begin{equation*}
\text{LHS}
= A\uvec{x}_0
= A(\uvec{x}_2-\uvec{x}_1)
= A\uvec{x}_2 - A\uvec{x}_1
= \uvec{b} - \uvec{b}
= \zerovec
= \text{RHS}.
\end{equation*}
So \(\uvec{x}_0\) is a solution to the homogeneous system. Furthermore,
\begin{equation*}
\uvec{x}_1 + \uvec{x}_0
= \uvec{x}_1 + (\uvec{x}_2 - \uvec{x}_1)
= (\uvec{x}_1 - \uvec{x}_1) + \uvec{x}_2
= \zerovec + \uvec{x}_2
= \uvec{x}_2.
\end{equation*}
Thus, \(\uvec{x}_2\) is equal to the sum of \(\uvec{x}_1\) and a solution to the homogeneous system (i.e. \(\uvec{x}_0\)), as desired.
We can also use the matrix algebra viewpoint of linear systems to definitively answer Question 1.3.4.
Theorem 4.5.5. None, one, or infinite solutions.
There are exactly three possibilities for the number of solutions to a linear system: no solution, one unique solution, or an infinite number of solutions.
Proof.
We have seen in examples that it is possible for a system to have no solution, and that it is also possible for a system to have one unique solution. We will argue that an infinite number of solutions is the only remaining possibility. If we are not in one of the first two cases, then our system must be consistent and must have more than one solution. That is, there must be at least two different solutions. Pick two different solutions, label them \(\uvec{x}_1\) and \(\uvec{x}_2\text{,}\) and set \(\uvec{x}_0 = \uvec{x}_2 - \uvec{x}_1\text{.}\) The same algebra as in the proof of Lemma 4.5.4 verifies that \(\uvec{x}_0\) is a solution to the homogeneous system \(A\uvec{x}=\zerovec\text{.}\) Let \(t\) be a parameter. We claim that for every possible value of the parameter \(t\text{,}\) \(\uvec{x}_1 + t\uvec{x}_0\) is a solution to \(A\uvec{x}=\uvec{b}\text{.}\) Let’s verify:
\begin{equation*}
\text{LHS}
= A (\uvec{x}_1 + t \uvec{x}_0)
= A \uvec{x}_1 + t A \uvec{x}_0
= \uvec{b} + t \zerovec
= \uvec{b} + \zerovec
= \uvec{b}
= \text{RHS}.
\end{equation*}
If \(\uvec{x}_0\) were secretly the zero vector, then \(\uvec{x}_1 + t \uvec{x}_0\) would always equal \(\uvec{x}_1\) no matter the value of \(t\text{.}\) But since \(\uvec{x}_1\) and \(\uvec{x}_2\) are different solutions to \(A \uvec{x} = \uvec{b}\text{,}\) we have \(\uvec{x}_0 = \uvec{x}_2 - \uvec{x}_1 \neq \zerovec\text{,}\) and so different values of \(t\) produce different column vectors \(\uvec{x}_1 + t \uvec{x}_0\text{.}\) Each of these column vectors is a solution to \(A \uvec{x} = \uvec{b}\text{,}\) as verified above, and so since there are infinity of possible values for \(t\text{,}\) there are infinite different possibilities for \(\uvec{x}_1 + t \uvec{x}_0\text{,}\) and so infinite possible solutions to \(A \uvec{x} = \uvec{b}\text{.}\)