Skip to main content

Section 38.1 Concepts

Subsection 38.1.1 Orthogonal projection

In a finite-dimensional inner product space \(V\text{,}\) a subspace \(U\) and its orthogonal complement \(\orthogcmp{U}\) form a complete set of independent subspaces (Corollary 37.5.19). So every vector \(\uvec{v}\) in \(V\) can be decomposed uniquely as into a sum of two vectors, one in \(U\) and one in \(\orthogcmp{U}\) (Proposition 28.6.8):

\begin{equation*} \uvec{v} = \uvec{u} + \uvec{u}'\text{,} \end{equation*}

with \(\uvec{u}\) in \(U\) and \(\uvec{u}'\) in \(\orthogcmp{U}\text{.}\)

Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)

The vector \(\uvec{u}\) in the decomposition is called the orthogonal projection of \(\uvec{v}\) onto \(U\), and we'll write

\begin{equation*} \proj_U \uvec{v} \end{equation*}

to mean this vector. Note that by the symmetry of orthogonal complements (Corollary 37.5.20), we could also say that the complementary vector \(\uvec{u}'\) in \(\orthogcmp{U}\) is

\begin{equation*} \proj_{\orthogcmp{U}} \uvec{v} \text{,} \end{equation*}

the orthogonal projection of \(\uvec{v}\) onto \(\orthogcmp{U}\text{.}\)

As a pair, the vectors \(\uvec{u}\) and \(\uvec{u}'\) are sometimes called the component of \(\uvec{v}\) parallel to \(U\) and the component of \(\uvec{v}\) orthogonal to \(U\), respectively.

The Expansion theorem, combined with Proposition 37.5.18, tells us how to compute orthogonal projections. An orthogonal basis

\begin{equation*} \basisfont{B}_U = \{ \uvec{e}_1, \dotsc, \uvec{e}_\ell \} \end{equation*}

for subspace \(U\) can be enlarged to an orthogonal basis

\begin{equation*} \basisfont{B}_V = \{ \uvec{e}_1, \dotsc, \uvec{e}_\ell, \uvec{e}_{\ell + 1}, \dotsc, \uvec{e}_n \} \end{equation*}

for the whole inner product space \(V\) (Corollary 37.5.10). But the extra vectors in the enlarged basis then form a basis for \(\orthogcmp{U}\text{:}\)

\begin{equation*} \basisfont{B}_{\orthogcmp{U}} = \{ \uvec{e}_{\ell + 1}, \dotsc, \uvec{e}_n \} \text{.} \end{equation*}

So if we expand \(\uvec{v}\) relative to \(\basisfont{B}_V\text{,}\) we will simultaneously obtain expansions for the components parallel and orthogonal to \(U\) relative to \(\basisfont{B}_U\) and \(\basisfont{B}_{\orthogcmp{U}}\text{,}\) respectively:

\begin{equation*} \uvec{v} = \underbrace{ \frac{\inprod{\uvec{v}}{\uvec{e}_1}}{\norm{\uvec{e}_1}^2} \, \uvec{e}_1 + \dotsb + \frac{\inprod{\uvec{v}}{\uvec{e}_\ell}}{\norm{\uvec{e}_\ell}^2} \, \uvec{e}_\ell }_{\proj_U \uvec{v}} + \underbrace{ \frac{\inprod{\uvec{v}}{\uvec{e}_{\ell + 1}}}{\norm{\uvec{e}_{\ell + 1}}^2} \, \uvec{e}_{\ell + 1} + \dotsb + \frac{\inprod{\uvec{v}}{\uvec{e}_n}}{\norm{\uvec{e}_n}^2} \, \uvec{e}_n }_{\proj_{\orthogcmp{U}} \uvec{v}}\text{.} \end{equation*}

Note that we don't actually need to know the extra vectors in \(\basisfont{B}_V\) that form \(\basisfont{B}_{\orthogcmp{U}}\) in order to compute either \(\proj_U \uvec{v}\) or \(\proj_{\orthogcmp{U}} \uvec{v}\text{.}\) All we need is the orthogonal basis \(\basisfont{B}_U\) (computed using the Gram-Schmidt orthogonalization process, if necessary), and then

\begin{gather} \proj_U \uvec{v} = \frac{\inprod{\uvec{v}}{\uvec{e}_1}}{\norm{\uvec{e}_1}^2} \, \uvec{e}_1 + \dotsb + \frac{\inprod{\uvec{v}}{\uvec{e}_\ell}}{\norm{\uvec{e}_\ell}^2} \, \uvec{e}_\ell \text{,}\label{equation-projection-concepts-formula}\tag{\(\star\)} \end{gather}
\begin{gather} \proj_{\orthogcmp{U}} \uvec{v} = \uvec{v} - \proj_U \uvec{v}\text{.}\label{equation-projection-concepts-complement-formula}\tag{\(\star\star\)} \end{gather}

See Example 38.3.1 in Subsection 38.3.1 for an example.

Orthogonal projection onto a vector in \(\R^n\).

In Chapter 14, we defined the orthogonal projection of one vector \(\uvec{v}\) onto another \(\uvec{a}\) in \(\R^n\) as

\begin{equation*} \proj_{\uvec{a}} \uvec{v} = \frac{\udotprod{u}{a}}{\unorm{a}^2}\, \uvec{a} \text{.} \end{equation*}

This is consistent with our current definition of orthogonal projection, in the case of a one-dimensional subspace \(U\) of \(\R^n\text{,}\) and the standard inner product on \(\R^n\text{.}\) In that case, every basis

\begin{equation*} \basisfont{B}_U = \{ \uvec{a} \} \end{equation*}

for \(U\) can be considered an orthogonal basis, and the formula

\begin{equation*} \proj_U \uvec{v} = \frac{\uvecinprod{v}{a}}{\unorm{a}^2} \, \uvec{a} \end{equation*}

agrees with our previous formula for \(\proj_{\uvec{a}} \uvec{v}\text{.}\)

Subsection 38.1.2 Gram-Schmidt process versus orthogonal projection

The steps in the Gram-Schmidt orthogonalization process create a sequence of orthogonal bases

\begin{equation*} \basisfont{B}_{U_1}, \basisfont{B}_{U_2}, \dotsc, \basisfont{B}_{U_n} \end{equation*}

for a nested sequence of subspaces

\begin{equation*} U_1 \subseteq U_2 \subseteq \dotsb \subseteq U_n = V \text{,} \end{equation*}

where \(\dim U_j = j\) for each index \(j\text{.}\) To enlarge basis \(\basisfont{B}_{U_j}\) to basis \(\basisfont{B}_{U_{j+1}}\text{,}\) the process appends the vector

\begin{equation*} \uvec{e}_{j+1} = \uvec{v}_{j+1} - \left[ \frac{\inprod{\uvec{v}_{j+1}}{\uvec{e}_1}}{\norm{\uvec{e}_1}^2} \, \uvec{e}_1 + \dotsb + \frac{\inprod{\uvec{v}_{j+1}}{\uvec{e}_j}}{\norm{\uvec{e}_j}^2} \, \uvec{e}_j \right]\text{,} \end{equation*}

where

\begin{equation*} \basisfont{B}_{U_j} = \{ \uvec{e}_1,\dotsc,\uvec{e}_j \} \end{equation*}

is the collection of orthogonal vectors created to that point in the process, and \(\uvec{v}_{j+1}\) is the \(\nth[(j+1)]\) vector in the initial basis used as the starting point in the process.

Comparing the expression in the brackets being subtracted from \(\uvec{v}_{j+1}\) with formulas (\(\star\)) and (\(\star\star\)), we see that

\begin{equation*} \uvec{e}_{j+1} = \uvec{v}_{j+1} - \proj_{U_j} \uvec{v}_{j+1} = \proj_{\orthogcmp{U}_j} \uvec{v}_{j+1} \text{.} \end{equation*}
Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)
Figure 38.1.1. Visualization of the result of the third step of the Gram-Schmidt orthogonalization process as an orthogonal projection.

Subsection 38.1.3 Best approximation and distance between vector and subspace

As in \(\R^n\text{,}\) we can measure distance between vectors by measuring the norm of a difference vector:

\begin{equation*} \dist(\uvec{u},\uvec{v}) = \norm{\uvec{u} - \uvec{v}} \text{.} \end{equation*}
Diagram illustrating distance in an inner product space.
Figure 38.1.2. Visualization of the distance between vectors in an inner product space.

We originally developed the concept of orthogonal projection in Chapter 14 to help us answer an approximation question (Question 14.3.2), and it will do the same in an abstract inner product space. If \(U\) is a subspace of a finite-dimensional inner product space, and \(\uvec{v}\) is a vector not in \(U\text{,}\) then amongst all vectors \(\uvec{u}\) in \(U\) the distance

\begin{equation*} \dist (\uvec{v},\uvec{u}) = \norm{\uvec{v} - \uvec{u}} \end{equation*}

will be minimized at

\begin{equation*} \uvec{u} = \proj_U \uvec{v} \text{.} \end{equation*}

(See Theorem 38.4.4.)

For this reason, the vector \(\proj_U \uvec{v}\) is called the best approximation to \(\uvec{v}\) in \(U\).

Diagram illustrating an orthogonal projection in \(\R^3\) as a best approximation.

Because of this, orthogonal projection can be used to solve approximation and optimization problems. See Example 38.3.2 in Subsection 38.3.2 for an example.

And since there is one unique smallest distance \(\dist (\uvec{v},\uvec{u}) \) amongst all \(\uvec{u}\) in \(U\text{,}\) achieved when \(\uvec{u} = \proj_U \uvec{v}\text{,}\) we can define this as the distance between \(\uvec{v}\) and \(U\):

\begin{equation*} \dist (\uvec{v},U) = \dist (\uvec{v},\proj_U \uvec{v}) = \norm{\uvec{v} - \proj_U \uvec{v}} = \norm{\proj_{\orthogcmp{U}} \uvec{v}} \text{.} \end{equation*}

Suppose \(\basisfont{B} = \{\uvec{e}_1, \uvec{e}_2, \dotsc, \uvec{e}_n\}\) is an orthonormal basis for our inner product space. (However, we do not assume here that the first so many vectors in this basis forms a basis for the subspace \(U\text{.}\)) We can then express the vector \(\uvec{v}\) and any vector \(\uvec{u}\) in \(U\) as a coordinate vector relative to this basis:

\begin{align*} \rmatrixOf{\uvec{v}}{B} \amp = (v_1,v_2,\dotsc,v_n) \text{,} \amp \rmatrixOf{\uvec{u}}{B} \amp = (u_1,u_2,\dotsc,u_n) \text{.} \end{align*}

Applying Statement 2 of Proposition 37.5.6, we have

\begin{equation*} \bbrac{\dist (\uvec{v},\uvec{u})}^2 = {\norm{\uvec{v} - \uvec{u}}}^2 = (v_1 - u_1)^2 + (v_2 - u_2)^2 + \dotsb + (v_n - u_n)^2 \text{.} \end{equation*}

Since we are talking about non-negative quantities, minimizing \(\dist (\uvec{v},\uvec{u})\) is the same as minimizing the square of that distance, and so we can say that setting \(\uvec{u} = \proj_U \uvec{v}\) minimizes the sum-of-squares expression on the right above. We could interpret this sum-of-squares formula as a measure of the “error” when we “approximate” \(\uvec{v}\) by a vector \(\uvec{u}\) in \(U\text{,}\) and so \(\uvec{u} = \proj_U \uvec{v}\) could be called the least squares approximation to \(\uvec{v}\) within \(U\text{.}\)

Subsection 38.1.4 Least squares solutions to a linear system

Here is a particular kind of linear approximation problem that can be solved with the help of orthogonal projection.

Question 38.1.3.

Suppose \(A\) is an \(m \times n\) coefficient matrix and \(\uvec{b}_0\) is a column vector in \(\R^m\) so that the linear system \(A \uvec{x} = \uvec{b}_0\) is inconsistent.

What vector \(\uvec{x}_0\) in \(\R^n\) comes closest to being a solution?

In other words, for what vector \(\uvec{x}_0\) in \(\R^n\) will \(A \uvec{x}_0\) be as close as possible to \(\uvec{b}_0\text{?}\)

We can pursue this question by “pushing it forward” through multiplication by \(A\text{,}\) so that it is a question about vectors in \(\R^m\text{,}\) where \(\uvec{b}_0\) and each \(A \uvec{x}\) live, instead of a question about vectors in \(\R^n\text{,}\) where each \(\uvec{x}\) lives (and where the answer \(\uvec{x}_0\) lives, if it exists).

We have seen before that a matrix-times-column product can be expanded as a linear combination of the columns of the matrix:

\begin{equation*} A \uvec{x} = x_1 \uvec{a}_1 + x_2 \uvec{a}_2 + \dotsb + x_n \uvec{a}_n \text{,} \end{equation*}

where the \(\uvec{a}_j\) are the columns of \(A\text{.}\) (For example, see Subsection 22.3.2.) It is this pattern that led us to the definition of column space of a matrix as the span of the columns of the matrix. Furthermore, we have seen that system \(A \uvec{x} = \uvec{b}\) is consistent precisely when the vector of constants \(\uvec{b}\) lies in the column space of \(A\text{.}\) (See Subsection 21.3.1.) Turning the above matrix-times-column pattern around, we also see that the column space of \(A\) is made up of all possible products \(A \uvec{x}\) for \(\uvec{x}\) in \(\R^n\text{.}\) (But note that it is possible for different vectors \(\uvec{x}\) in \(\R^n\) to produce the same vector \(\uvec{b}\) in the column space of \(A\text{,}\) since we know that a consistent system \(A \uvec{x} = \uvec{b}\) could have an infinite number of solutions.)

We have assumed that our system is inconsistent, so that \(\uvec{b}_0\) must not lie in the column space of \(A\text{,}\) whereas every result \(A \uvec{x}\) lies in the column space of \(A\text{.}\) We can now re-frame Question 38.1.3 in a way that will allow us to apply orthogonal projection: what vector(s) \(\uvec{x}\) will produce an “output” vector \(A \uvec{x}\) in the column space of \(A\) that is closest to \(\uvec{b}_0\text{?}\) The answer is: any vector \(\uvec{x}_0\) so that

\begin{gather} A \uvec{x}_0 = \proj_U \uvec{b}_0 \text{,}\label{equation-projection-least-sq-sol}\tag{\(\dagger\)} \end{gather}

where \(U\) is the column space of \(A\) in \(\R^m\text{.}\)

Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)

One way to solve this problem is to use Procedure 21.3.2 to determine a basis for the column space of matrix A, apply the Gram-Schmidt orthogonalization procedure to produce an orthogonal basis for the column space, and then use that basis to compute \(\proj_U \uvec{b}_0\) (where \(U\) represents the column space of \(A\)). Finally, we could solve the system \(A \uvec{x} = \proj_U \uvec{b}_0\text{,}\) which must be consistent because \(\proj_U \uvec{b}_0\) will lie in \(U\text{,}\) the column space of \(A\text{.}\)

But by applying some theory, we can develop a more direct procedure. Take pairing \(\inprod{\blank}{\blank}\) to mean the standard inner product on \(\R^m\) (or, later, on \(\R^n\)). Now, the vector

\begin{equation*} \uvec{b}_0' = \proj_{\orthogcmp{U}} \uvec{b}_0 = \uvec{b}_0 - \proj_{U} \uvec{b}_0 \end{equation*}

lies in \(\orthogcmp{U}\text{,}\) and so is orthogonal to every vector in \(U\text{,}\) the column space of \(A\text{.}\) So for every \(\uvec{x}\) in \(\R^n\text{,}\) we have

\begin{align*} 0 \amp = \inprod{\uvec{b}_0'}{A \uvec{x}}\\ \amp = \utrans{(A \uvec{x})} (\uvec{b}_0') \\ \amp = \utrans{\uvec{x}} (\utrans{A} \uvec{b}_0') \\ \amp = \inprod{\utrans{A} \uvec{b}_0'}{\uvec{x}} \text{.} \end{align*}

As the above calculation holds for every \(\uvec{x}\) in \(\R^n\text{,}\) the result says that \(\utrans{A} \uvec{b}_0'\) lies in \(\orthogcmp{(\R^n)}\text{.}\) However, in an inner product space, only the zero vector is orthogonal to every vector (Statement 2 of Proposition 37.5.15). Hence

\begin{equation*} \zerovec = \utrans{A} \uvec{b}_0' = \utrans{A} (\uvec{b}_0 - \proj_{U} \uvec{b}_0) = \utrans{A} (\uvec{b}_0 - A \uvec{x}_0) \end{equation*}

(using (\(\dagger\))). Re-arranging, we have

\begin{equation*} \utrans{A} A \uvec{x}_0 = \utrans{A} \uvec{b}_0 \text{.} \end{equation*}

In other words, the special \(\uvec{x}_0\) that we are looking for is a solution to

\begin{equation*} \utrans{A} A \uvec{x} = \utrans{A} \uvec{b}_0 \text{,} \end{equation*}

called the normal system associated to \(A \uvec{x} = \uvec{b}_0\). Solutions to this system are referred to as least-squares solutions for the original inconsistent system \(A \uvec{x} = \uvec{b}_0\text{.}\)

Because \(\proj_{U} \uvec{b}_0\) lies in the column space of \(A\text{,}\) there is always some \(\uvec{x}_0\) that satisfies (\(\dagger\)), hence there is always a solution to the normal system. If \(A\) is square and invertible, then so is \(\utrans{A}A\text{,}\) and in this case there is one unique solution

\begin{gather} \uvec{x}_0 = \inv{(\utrans{A} A)} \utrans{A} \uvec{b}_0\label{equation-projection-least-sq-pseudo-inverse-sol}\tag{\(\dagger\dagger\)} \end{gather}

But even if \(A\) is not square, the coefficient matrix of the normal system, \(\utrans{A} A\text{,}\) is always square: we have assumed \(A\) is \(m \times n\text{,}\) so \(\utrans{A} A\) is \(n \times n\text{.}\) If this matrix is invertible, then again there is one unique solution as above. In this case,

\begin{equation*} \inv{(\utrans{A} A)} \utrans{A} \end{equation*}

is called the pseudo-inverse of \(A\text{,}\) as an analogy between (\(\dagger\dagger\)) with the fact that a square, invertible coefficient matrix affords solution

\begin{equation*} A \uvec{x} = \uvec{b} \quad\implies\quad \uvec{x} = \inv{A} \uvec{b} \text{.} \end{equation*}

But recall that in this analysis, we have not assumed that \(A\) is square, hence the pseudo in pseudo-inverse.

See Example 38.3.3 in Subsection 38.3.3 for an example of computing a least-squares solution.