Section 4.3 Concepts
In this section.
Subsection 4.3.1 Matrix entries
Matrices are big, unwieldy things, so we often use a letter as a placeholder for a matrix, just as we might use a letter to represent a number in algebra. We usually use uppercase letters for matrices, as in Discovery guide 4.1. (Though sometimes we use a boldface lowercase letter to represent a column or row vector, as in Discovery 4.5.) When we want to refer to a specific entry in a matrix, we identify it by two indices: its row number and its column number, in that order. For example, the \nth[(2,1)] entry of matrix A of Discovery 4.2 is -1\text{.} When we have a matrix represented by an uppercase letter and want to also use letters to represent its entries, we usually use the lowercase version of the same letter, with the row and column indices in subscript. For example, for the matrix A of Discovery 4.2, the \nth[(2,1)] entry is a_{21} = -1\text{.} Sometimes we might write [A]_{ij} to refer to the \nth[(i,j)] entry of matrix A\text{,} particularly when instead of a single letter inside the square brackets, we have a formula of letters.Subsection 4.3.2 Matrix dimensions
Matrices have an obvious notion of size, but we need two numbers to describe it: the number of rows and the number of columns. Again, by convention we always list number of rows first. For example, matrix A of Discovery 4.2 is size 2 \times 3\text{,} meaning it has 2 rows and 3 columns. For a square matrix, the two numbers describing the size of A are equal, so we might just say that a square matrix A has size n to mean it is n\times n\text{.}Subsection 4.3.3 Matrix equality
In Discovery 4.2, you explored what it means for two matrices to be equal. In algebra involving numbers, we write a=b when variables a and b represent the same number. That is, a and b are equal when they represent the same piece of information. Similarly, two βvariableβ matrices are equal when they represent the same information. In particular, two matrices are equal when they have the same numbers in corresponding entries. But size is also important here: in Discovery 4.2, matrix D can never be equal to matrix A no matter what value we choose for variable x\text{,} because A will always contain more information than D in its extra third column. So even before we compare entries, we require equal matrices to have the same size.Subsection 4.3.4 Basic matrix operations
In Discovery 4.3, you probably decided that addition and subtraction of matrices should be carried out in the obvious ways: we should just add or subtract corresponding entries. See Example 4.4.1 and Example 4.4.2. For matrices that have different sizes, it may be tempting to βfill outβ the smaller matrix with zeros so that it can be added to the larger. But this would add more information to the smaller matrix that it's not supposed to have, creating a different matrix prior to the addition. So we should resist this temptation; we will only ever add or subtract matrices that have the same size, and addition/subtraction of matrices of different sizes will remain undefined. When we multiply a number a by 2 to get 2a\text{,} we are doubling the value of a\text{.} In other words, we are scaling a by a scale factor (or scalar) of 2\text{.} Similarly, we can use a scalar to βscaleβ a matrix by multiplying every entry in the matrix by that number. If A is a matrix and k is a scalar (i.e. a number), then kA is the scalar multiple of A by k\text{.} See Example 4.4.3.Subsection 4.3.5 The zero matrix
The number zero plays a special role with respect to addition of numbers: it is the only number that has no effect when it is added to another number. For addition of matrices of a particular size, there is only one kind of matrix that has the same effect: a matrix filled with all zeros. We call such a matrix the zero matrix, and write \zerovec to represent it.Remark 4.3.1.
There are many zero matrices, one of every possible size of matrix. However, we still often say the zero matrix, because we are usually referring to the zero matrix of a particular size.
Subsection 4.3.6 Linear systems as matrix equations
Consider the system in Task b of Discovery 4.5:
\begin{align}
\left\{\begin{array}{rcrcrcr}
x_1 \amp - \amp 3 x_2 \amp - \amp x_3 \amp = \amp -4 \text{,} \\
-2 x_1 \amp + \amp 7 x_2 \amp + \amp 2 x_3 \amp = \amp 9 \text{.}
\end{array}\right.\label{equation-matrix-ops-concepts-system-convert-matrix-eqn}\tag{\(\star\)}
\end{align}
We would like to replace these two equations by a single matrix equation, which is easy enough to do:
\begin{gather}
\left[\begin{array}{c}
x_1 - 3 x_2 - x_3 \\
-2 x_1 + 7 x_2 + 2 x_3
\end{array}\right]
=
\left[\begin{array}{r} -4 \\ 9 \end{array}\right]\text{.}\label{equation-matrix-ops-concepts-system-as-equal-cols}\tag{\(\star\star\)}
\end{gather}
Note that both of these column matrices are 2 \times 1 matrices β even though the entries of the left-hand matrix seem to contain a lot of numbers, each row has only a single entry because these formulas are calculation recipes that compute a single number out of several numbers, some known and some unknown.
To make such a matrix equation more resemble the basic linear equation pattern of
\begin{equation*}
\text{coefficient} \times \text{unknown} = \text{constant} \text{,}
\end{equation*}
we collect all the system coefficients into a coefficient matrix, all the variables into the (column) vector of unknowns, and all the right-hand constants into the (column) vector of constants:
\begin{align*}
A \amp = \left[\begin{array}{rrr}
1 \amp -3 \amp -1 \\
-2 \amp 7 \amp 2
\end{array}\right],
\amp
\uvec{x} \amp = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}, \amp
\uvec{b} \amp = \left[\begin{array}{r} -4 \\ 9 \end{array}\right],
\end{align*}
respectively.
Remark 4.3.2.
It may seem more natural to write the vector of unknowns as a row vector instead of a column vector, but it is preferable mathematically to have all of the vectors involved be (roughly) the same kind of vector (even though they are often not exactly the same kind of vector, since they might not have the same size).
\begin{equation*}
A \uvec{x} =
\left[\begin{array}{c}
x_1 - 3 x_2 - x_3 \\
-2 x_1 + 7 x_2 + 2 x_3
\end{array}\right]\text{,}
\end{equation*}
or
\begin{equation*}
\left[\begin{array}{rrr}
1 \amp -3 \amp -1 \\
-2 \amp 7 \amp 2
\end{array}\right]
\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}
=
\left[\begin{array}{c}
x_1 - 3 x_2 - x_3 \\
-2 x_1 + 7 x_2 + 2 x_3
\end{array}\right]\text{.}
\end{equation*}
We can now see how a matrix times a column should proceed: multiply the entries of the first row of the matrix against the corresponding entries in the column, add these products, and put the result in the first entry of the result column matrix. Then multiply the second row of the matrix against the column in the same fashion and put the result in the second entry of the result column matrix. And so on, if the matrix has more than two rows. See Subsection 4.3.7 below for a more detailed description on this process.
With the matrix product A \uvec{x} defined in this way, the single matrix equation A \uvec{x} = \uvec{b} now contains all the same information as the multiple linear equations of the original system.Subsection 4.3.7 Matrix multiplication
We can extend this row-times-column calculation procedure to define multiplication of two matrices (instead of just a matrix and a column vector) by thinking of the second matrix as a collection of columns,
\begin{align}
B = \begin{bmatrix}
| \amp | \amp \amp |\\
\uvec{b}_1 \amp \uvec{b}_2 \amp \cdots \amp \uvec{b}_\ell\\
| \amp | \amp \amp |
\end{bmatrix}
\quad\implies\quad
AB = \begin{bmatrix}
| \amp | \amp \amp |\\
A\uvec{b}_1 \amp A\uvec{b}_2 \amp \cdots \amp A\uvec{b}_\ell\\
| \amp | \amp \amp |
\end{bmatrix}.\label{equation-matrix-ops-concepts-matrix-mult-by-cols}\tag{\(\star\star\star\)}
\end{align}
This matrix-times-columns way of defining matrix multiplication will be very useful later. But right now, let's drill down to individual entries of the result AB\text{.}
Let's first consider the case of a 1\times n row vector \uvec{a} times an n\times 1 column vector \uvec{b}\text{.} In this case,
\begin{align}
\uvec{a}\uvec{b}
= \begin{bmatrix}
a_1 \amp a_2 \amp \cdots \amp a_n
\end{bmatrix}
\begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix}
= \begin{bmatrix}a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n\end{bmatrix}.\label{equation-matrix-ops-concepts-row-times-column}\tag{\(\dagger\)}
\end{align}
Notice that the result is a 1\times 1 matrix containing just a single entry.
Now let's consider a matrix A times a column \uvec{b}\text{,} where we consider A as being made of row vectors. Then,
\begin{equation*}
A\uvec{b}
= \begin{bmatrix}
\leftrightlinesubstitute \amp \uvec{a}_1 \amp \leftrightlinesubstitute\\
\leftrightlinesubstitute \amp \uvec{a}_2 \amp \leftrightlinesubstitute\\
\amp\vdots\\
\leftrightlinesubstitute \amp \uvec{a}_m \amp \leftrightlinesubstitute\\
\end{bmatrix}\uvec{b}
= \begin{bmatrix}\uvec{a}_1\uvec{b} \\ \uvec{a}_2\uvec{b} \\ \vdots \\ \uvec{a}_m\uvec{b}\end{bmatrix},
\end{equation*}
where each entry \uvec{a}_i \uvec{b} in the result on the right is calculated by the row-times-column pattern from (\dagger). However, we do not actually have a 1\times 1 matrix in each entry, but instead place the number that would be the sole entry in \uvec{a}_i\uvec{b}\text{.}
Finally, we can extend this to the case of matrix A times matrix B\text{,} by
\begin{equation*}
AB =
\begin{bmatrix}
\leftrightlinesubstitute \amp \uvec{a}_1 \amp \leftrightlinesubstitute \\
\leftrightlinesubstitute \amp \uvec{a}_2 \amp \leftrightlinesubstitute\\
\amp\vdots\\
\leftrightlinesubstitute \amp \uvec{a}_m \amp \leftrightlinesubstitute\\
\end{bmatrix}
\begin{bmatrix}
| \amp | \amp \amp |\\
\uvec{b}_1 \amp \uvec{b}_2 \amp \cdots \amp \uvec{b}_\ell\\
| \amp | \amp \amp |
\end{bmatrix}
=
\begin{bmatrix}
\uvec{a}_1\uvec{b}_1 \amp \uvec{a}_1\uvec{b}_2 \amp \cdots \amp \uvec{a}_1\uvec{b}_\ell\\
\uvec{a}_2\uvec{b}_1 \amp \uvec{a}_2\uvec{b}_2 \amp \cdots \amp \uvec{a}_2\uvec{b}_\ell\\
\vdots \amp \vdots \amp \ddots \amp \vdots \\
\uvec{a}_m\uvec{b}_1 \amp \uvec{a}_m\uvec{b}_2 \amp \cdots \amp \uvec{a}_m\uvec{b}_\ell\\
\end{bmatrix}.
\end{equation*}
Pattern.
The \nth[(i,j)] entry of matrix product AB is the result of a row-times-column calculation, as in (\dagger), using the \nth[i] row of A and the \nth[j] column of B\text{.}Pattern.
If A is m \times n and B is k \times \ell\text{,} we can only compute A B if n and k are the same; otherwise, we say that the product A B is undefined. In the case that n and k are the same, the product AB has size m \times \ell\text{.} An easy way to remember this is that if we want to multiply
\begin{equation*}
m \times n \quad\text{ times }\quad k \times \ell\text{,}
\end{equation*}
it will only work if the βinsideβ dimensions n and k match, and result will be the βoutsideβ dimensions m \times \ell\text{.}Warning 4.3.3.
When manipulating algebraic expressions where the letters represent matrices, be careful not to inadvertently use the algebra rule B A = A B\text{,} because it is not true for matrices.
Subsection 4.3.8 Matrix powers
As you probably decided in Discovery 4.9, we define powers of matrices in the usual way: A^2 means βA times A\text{,}β A^3 means βA times A times A\text{,}β and so on.Warning 4.3.4.
- To compute A^2\text{,} you need to carry out the computation A A using the βrow times columnβ definition of matrix multiplication. Just squaring every entry of A will not give you the correct result! And similarly for A^3\text{,} A^4\text{,} etc. β you need to carry out all the iterated multiplications. See Subsection 4.4.2 for example calculations.
- As in the second pattern discussed in Subsection 4.3.8, we can only compute the product A^2 = A A if the number of columns of A is equal to the number of rows of A\text{.} That is, matrix powers are only defined for square matrices.