Skip to main content

Section 4.3 Concepts

Subsection 4.3.1 Matrix entries

Matrices are big, unwieldy things, so we often use a letter as a placeholder for a matrix, just as we might use a letter to represent a number in algebra. We usually use uppercase letters for matrices, as in Discovery guide 4.1. (Though sometimes we use a boldface lowercase letter to represent a column or row vector, as in Discovery 4.5.) When we want to refer to a specific entry in a matrix, we identify it by two indices: its row number and its column number, in that order. For example, the \(\nth[(2,1)]\) entry of matrix \(A\) of Discovery 4.2 is \(-1\text{.}\) When we have a matrix represented by an uppercase letter and want to also use letters to represent its entries, we usually use the lowercase version of the same letter, with the row and column indices in subscript. For example, for the matrix \(A\) of Discovery 4.2, the \(\nth[(2,1)]\) entry is \(a_{21} = -1\text{.}\) Sometimes we might write \([A]_{ij}\) to refer to the \(\nth[(i,j)]\) entry of matrix \(A\text{,}\) particularly when instead of a single letter inside the square brackets, we have a formula of letters.

Subsection 4.3.2 Matrix dimensions

Matrices have an obvious notion of size, but we need two numbers to describe it: the number of rows and the number of columns. Again, by convention we always list number of rows first. For example, matrix \(A\) of Discovery 4.2 is size \(2 \times 3\text{,}\) meaning it has \(2\) rows and \(3\) columns. For a square matrix, the two numbers describing the size of \(A\) are equal, so we might just say that a square matrix \(A\) has size \(n\) to mean it is \(n\times n\text{.}\)

Subsection 4.3.3 Matrix equality

In Discovery 4.2, you explored what it means for two matrices to be equal. In algebra involving numbers, we write \(a=b\) when variables \(a\) and \(b\) represent the same number. That is, \(a\) and \(b\) are equal when they represent the same piece of information. Similarly, two “variable” matrices are equal when they represent the same information. In particular, two matrices are equal when they have the same numbers in corresponding entries. But size is also important here: in Discovery 4.2, matrix \(D\) can never be equal to matrix \(A\) no matter what value we choose for variable \(x\text{,}\) because \(A\) will always contain more information than \(D\) in its extra third column. So even before we compare entries, we require equal matrices to have the same size.

Subsection 4.3.4 Basic matrix operations

In Discovery 4.3, you probably decided that addition and subtraction of matrices should be carried out in the obvious ways: we should just add or subtract corresponding entries. See Example 4.4.1 and Example 4.4.2.

For matrices that have different sizes, it may be tempting to “fill out” the smaller matrix with zeros so that it can be added to the larger. But this would add more information to the smaller matrix that it's not supposed to have, creating a different matrix prior to the addition. So we should resist this temptation; we will only ever add or subtract matrices that have the same size, and addition/subtraction of matrices of different sizes will remain undefined.

When we multiply a number \(a\) by \(2\) to get \(2a\text{,}\) we are doubling the value of \(a\text{.}\) In other words, we are scaling \(a\) by a scale factor (or scalar) of \(2\text{.}\) Similarly, we can use a scalar to “scale” a matrix by multiplying every entry in the matrix by that number. If \(A\) is a matrix and \(k\) is a scalar (i.e. a number), then \(kA\) is the scalar multiple of \(A\) by \(k\text{.}\) See Example 4.4.3.

Subsection 4.3.5 The zero matrix

The number zero plays a special role with respect to addition of numbers: it is the only number that has no effect when it is added to another number. For addition of matrices of a particular size, there is only one kind of matrix that has the same effect: a matrix filled with all zeros. We call such a matrix the zero matrix, and write \(\zerovec\) to represent it.

Remark 4.3.1.

There are many zero matrices, one of every possible size of matrix. However, we still often say the zero matrix, because we are usually referring to the zero matrix of a particular size.

The zero matrix will allow us to do the matrix version of the algebra in the preamble to Discovery 4.4, since subtracting a matrix from itself will obviously result in the zero matrix. For more properties of the zero matrix, see Proposition 4.5.1 in Subsection 4.5.1.

Subsection 4.3.6 Linear systems as matrix equations

Consider the system in Task b of Discovery 4.5:

\begin{align} \left\{\begin{array}{rcrcrcr} x_1 \amp - \amp 3 x_2 \amp - \amp x_3 \amp = \amp -4 \text{,} \\ -2 x_1 \amp + \amp 7 x_2 \amp + \amp 2 x_3 \amp = \amp 9 \text{.} \end{array}\right.\label{equation-matrix-ops-concepts-system-convert-matrix-eqn}\tag{\(\star\)} \end{align}

We would like to replace these two equations by a single matrix equation, which is easy enough to do:

\begin{gather} \left[\begin{array}{c} x_1 - 3 x_2 - x_3 \\ -2 x_1 + 7 x_2 + 2 x_3 \end{array}\right] = \left[\begin{array}{r} -4 \\ 9 \end{array}\right]\text{.}\label{equation-matrix-ops-concepts-system-as-equal-cols}\tag{\(\star\star\)} \end{gather}

Note that both of these column matrices are \(2 \times 1\) matrices — even though the entries of the left-hand matrix seem to contain a lot of numbers, each row has only a single entry because these formulas are calculation recipes that compute a single number out of several numbers, some known and some unknown.

To make such a matrix equation more resemble the basic linear equation pattern of

\begin{equation*} \text{coefficient} \times \text{unknown} = \text{constant} \text{,} \end{equation*}

we collect all the system coefficients into a coefficient matrix, all the variables into the (column) vector of unknowns, and all the right-hand constants into the (column) vector of constants:

\begin{align*} A \amp = \left[\begin{array}{rrr} 1 \amp -3 \amp -1 \\ -2 \amp 7 \amp 2 \end{array}\right], \amp \uvec{x} \amp = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}, \amp \uvec{b} \amp = \left[\begin{array}{r} -4 \\ 9 \end{array}\right], \end{align*}

respectively.

Remark 4.3.2.

It may seem more natural to write the vector of unknowns as a row vector instead of a column vector, but it is preferable mathematically to have all of the vectors involved be (roughly) the same kind of vector (even though they are often not exactly the same kind of vector, since they might not have the same size).

We would like to express the system in (\(\star\)) as one matrix equation \(A \uvec{x} = \uvec{b}\text{,}\) and to do this we need to decide how \(A\) times \(\uvec{x}\) should work. But we already know how to represent the system as a single matrix equation (see (\(\star\star\))), so we should have

\begin{equation*} A \uvec{x} = \left[\begin{array}{c} x_1 - 3 x_2 - x_3 \\ -2 x_1 + 7 x_2 + 2 x_3 \end{array}\right]\text{,} \end{equation*}

or

\begin{equation*} \left[\begin{array}{rrr} 1 \amp -3 \amp -1 \\ -2 \amp 7 \amp 2 \end{array}\right] \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \left[\begin{array}{c} x_1 - 3 x_2 - x_3 \\ -2 x_1 + 7 x_2 + 2 x_3 \end{array}\right]\text{.} \end{equation*}

We can now see how a matrix times a column should proceed: multiply the entries of the first row of the matrix against the corresponding entries in the column, add these products, and put the result in the first entry of the result column matrix. Then multiply the second row of the matrix against the column in the same fashion and put the result in the second entry of the result column matrix. And so on, if the matrix has more than two rows. See Subsection 4.3.7 below for a more detailed description on this process.

With the matrix product \(A \uvec{x}\) defined in this way, the single matrix equation \(A \uvec{x} = \uvec{b}\) now contains all the same information as the multiple linear equations of the original system.

Subsection 4.3.7 Matrix multiplication

We can extend this row-times-column calculation procedure to define multiplication of two matrices (instead of just a matrix and a column vector) by thinking of the second matrix as a collection of columns,

\begin{align} B = \begin{bmatrix} | \amp | \amp \amp |\\ \uvec{b}_1 \amp \uvec{b}_2 \amp \cdots \amp \uvec{b}_\ell\\ | \amp | \amp \amp | \end{bmatrix} \quad\implies\quad AB = \begin{bmatrix} | \amp | \amp \amp |\\ A\uvec{b}_1 \amp A\uvec{b}_2 \amp \cdots \amp A\uvec{b}_\ell\\ | \amp | \amp \amp | \end{bmatrix}.\label{equation-matrix-ops-concepts-matrix-mult-by-cols}\tag{\(\star\star\star\)} \end{align}

This matrix-times-columns way of defining matrix multiplication will be very useful later. But right now, let's drill down to individual entries of the result \(AB\text{.}\)

Let's first consider the case of a \(1\times n\) row vector \(\uvec{a}\) times an \(n\times 1\) column vector \(\uvec{b}\text{.}\) In this case,

\begin{align} \uvec{a}\uvec{b} = \begin{bmatrix} a_1 \amp a_2 \amp \cdots \amp a_n \end{bmatrix} \begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix} = \begin{bmatrix}a_1 b_1 + a_2 b_2 + \dotsb + a_n b_n\end{bmatrix}.\label{equation-matrix-ops-concepts-row-times-column}\tag{\(\dagger\)} \end{align}

Notice that the result is a \(1\times 1\) matrix containing just a single entry.

Now let's consider a matrix \(A\) times a column \(\uvec{b}\text{,}\) where we consider \(A\) as being made of row vectors. Then,

\begin{equation*} A\uvec{b} = \begin{bmatrix} \leftrightlinesubstitute \amp \uvec{a}_1 \amp \leftrightlinesubstitute\\ \leftrightlinesubstitute \amp \uvec{a}_2 \amp \leftrightlinesubstitute\\ \amp\vdots\\ \leftrightlinesubstitute \amp \uvec{a}_m \amp \leftrightlinesubstitute\\ \end{bmatrix}\uvec{b} = \begin{bmatrix}\uvec{a}_1\uvec{b} \\ \uvec{a}_2\uvec{b} \\ \vdots \\ \uvec{a}_m\uvec{b}\end{bmatrix}, \end{equation*}

where each entry \(\uvec{a}_i \uvec{b}\) in the result on the right is calculated by the row-times-column pattern from (\(\dagger\)). However, we do not actually have a \(1\times 1\) matrix in each entry, but instead place the number that would be the sole entry in \(\uvec{a}_i\uvec{b}\text{.}\)

Finally, we can extend this to the case of matrix \(A\) times matrix \(B\text{,}\) by

\begin{equation*} AB = \begin{bmatrix} \leftrightlinesubstitute \amp \uvec{a}_1 \amp \leftrightlinesubstitute \\ \leftrightlinesubstitute \amp \uvec{a}_2 \amp \leftrightlinesubstitute\\ \amp\vdots\\ \leftrightlinesubstitute \amp \uvec{a}_m \amp \leftrightlinesubstitute\\ \end{bmatrix} \begin{bmatrix} | \amp | \amp \amp |\\ \uvec{b}_1 \amp \uvec{b}_2 \amp \cdots \amp \uvec{b}_\ell\\ | \amp | \amp \amp | \end{bmatrix} = \begin{bmatrix} \uvec{a}_1\uvec{b}_1 \amp \uvec{a}_1\uvec{b}_2 \amp \cdots \amp \uvec{a}_1\uvec{b}_\ell\\ \uvec{a}_2\uvec{b}_1 \amp \uvec{a}_2\uvec{b}_2 \amp \cdots \amp \uvec{a}_2\uvec{b}_\ell\\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ \uvec{a}_m\uvec{b}_1 \amp \uvec{a}_m\uvec{b}_2 \amp \cdots \amp \uvec{a}_m\uvec{b}_\ell\\ \end{bmatrix}. \end{equation*}
Pattern.

The \(\nth[(i,j)]\) entry of matrix product \(AB\) is the result of a row-times-column calculation, as in (\(\dagger\)), using the \(\nth[i]\) row of \(A\) and the \(\nth[j]\) column of \(B\text{.}\)

In order for each row-times-column calculation to work, we need the number of entries in a row of \(A\) to match up with the number of entries in a column of \(B\text{.}\) (Just as in the definition of matrix addition, we do not “fill out” a matrix with extra entries if these numbers do not match.) But the number of entries in a row of \(A\) is the number of columns of \(A\text{,}\) and the number of entries in a column of \(B\) is the number of rows of \(B\text{.}\)

Pattern.

If \(A\) is \(m \times n\) and \(B\) is \(k \times \ell\text{,}\) we can only compute \(A B\) if \(n\) and \(k\) are the same; otherwise, we say that the product \(A B\) is undefined. In the case that \(n\) and \(k\) are the same, the product \(AB\) has size \(m \times \ell\text{.}\)

An easy way to remember this is that if we want to multiply

\begin{equation*} m \times n \quad\text{ times }\quad k \times \ell\text{,} \end{equation*}

it will only work if the “inside” dimensions \(n\) and \(k\) match, and result will be the “outside” dimensions \(m \times \ell\text{.}\)

In Discovery 4.7, you found that one of the familiar rules of algebra is not true for matrix algebra: matrices cannot be multiplied in any order, because different orders of multiplication might yield different results. In fact, for non-square matrices, often one of the two orders of multiplication is not even defined.

Warning 4.3.3.

When manipulating algebraic expressions where the letters represent matrices, be careful not to inadvertently use the algebra rule \(B A = A B\text{,}\) because it is not true for matrices.

Subsection 4.3.8 Matrix powers

As you probably decided in Discovery 4.9, we define powers of matrices in the usual way: \(A^2\) means “\(A\) times \(A\text{,}\)” \(A^3\) means “\(A\) times \(A\) times \(A\text{,}\)” and so on.

Warning 4.3.4.
  • To compute \(A^2\text{,}\) you need to carry out the computation \(A A\) using the “row times column” definition of matrix multiplication. Just squaring every entry of \(A\) will not give you the correct result! And similarly for \(A^3\text{,}\) \(A^4\text{,}\) etc. — you need to carry out all the iterated multiplications. See Subsection 4.4.2 for example calculations.
  • As in the second pattern discussed in Subsection 4.3.8, we can only compute the product \(A^2 = A A\) if the number of columns of \(A\) is equal to the number of rows of \(A\text{.}\) That is, matrix powers are only defined for square matrices.

The fact that reversing the order of matrix multiplication can produce a different result adds some extra wrinkles to the algebra of matrix powers. In Discovery 4.9.b and Discovery 4.9.c, we need to be careful about order of multiplication. By definition, \((A B)^2\) means \((A B) (A B)\text{,}\) but we cannot simplify this to \(A^2 B^2 = (A A) (B B)\text{,}\) because order of multiplication matters, and we so we cannot in general change the order of multiplication of the inner \(B\) and \(A\text{.}\) Similarly, when using FOIL to expand \((A + B)^2 = (A + B) (A + B)\) (which is valid matrix algebra, see Subsection 4.5.1), for the O part of FOIL we get \(A B\) and for the I part we get \(B A\text{,}\) but these cannot be combined into \(2 A B\) in general because order matters for matrix multiplication.

Subsection 4.3.9 Transpose

There is one more matrix operation that we did not explore in Discovery guide 4.1: the transpose of a matrix. To compute the transpose of a particular matrix \(A\text{,}\) take the entries of the first row of \(A\) and write them as the entries of the first column in a new matrix. Then take the entries of the second row of \(A\) and write them as the entries of the second column in the new matrix. And so on. The resulting new matrix is called the transpose of \(A\), and we write \(\utrans{A}\) to mean this new matrix obtained from the old matrix \(A\text{.}\) See Subsection 4.4.5 for examples of computing transposes.

It is not possible at this stage to explain why we might want to use such an operation. If we are thinking of matrices as coefficient or augmented matrices of linear systems, why would we want all the coefficients in a particular equation in a system to become the coefficients attached to a particular variable in a new system? However, the transpose is such a simple operation that it is useful to include its properties in our development at this early stage.

Here are some things to notice about the operation of transpose as you look at the examples in Subsection 4.4.5. First, since we are taking rows of \(A\) and making them columns in \(\utrans{A}\text{,}\) the number of columns of \(\utrans{A}\) must be the number of rows of \(A\text{.}\) Also, the number of entries in a row of \(A\) becomes the number of entries in a column of \(\utrans{A}\text{,}\) so the same must be true about the number of rows of \(\utrans{A}\) versus the number of columns of \(A\text{.}\) That is, if \(A\) is size \(m\times n\text{,}\) then \(\utrans{A}\) is size \(n\times m\text{.}\) Second, instead of turning rows of \(A\) into columns of \(\utrans{A}\text{,}\) notice that we could take the columns of \(A\) and use them as rows in a new matrix, and the result would be the same as \(\utrans{A}\text{.}\) This symmetry means that if we compute the transpose of \(\utrans{A}\text{,}\) we will be back at \(A\text{.}\)