Processing math: 47%
Skip to main content

Section 38.1 Concepts

Subsection 38.1.1 Orthogonal projection

In a finite-dimensional inner product space V, a subspace U and its orthogonal complement UβŠ₯ form a complete set of independent subspaces (Corollary 37.5.19). So every vector v in V can be decomposed uniquely as into a sum of two vectors, one in U and one in UβŠ₯ (Proposition 28.6.8):

v=u+uβ€²,

with u in U and uβ€² in UβŠ₯.

Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)

The vector u in the decomposition is called the orthogonal projection of v onto U, and we'll write

projUv

to mean this vector. Note that by the symmetry of orthogonal complements (Corollary 37.5.20), we could also say that the complementary vector uβ€² in UβŠ₯ is

projUβŠ₯v,

the orthogonal projection of v onto UβŠ₯.

As a pair, the vectors u and uβ€² are sometimes called the component of v parallel to U and the component of v orthogonal to U, respectively.

The Expansion theorem, combined with Proposition 37.5.18, tells us how to compute orthogonal projections. An orthogonal basis

BU={e1,…,eβ„“}

for subspace U can be enlarged to an orthogonal basis

BV={e1,…,eβ„“,eβ„“+1,…,en}

for the whole inner product space V (Corollary 37.5.10). But the extra vectors in the enlarged basis then form a basis for UβŠ₯:

BUβŠ₯={eβ„“+1,…,en}.

So if we expand v relative to BV, we will simultaneously obtain expansions for the components parallel and orthogonal to U relative to BU and BUβŠ₯, respectively:

v=⟨v,e1βŸ©β€–e1β€–2e1+β‹―+⟨v,eβ„“βŸ©β€–eβ„“β€–2eβ„“βŸprojUv+⟨v,eβ„“+1βŸ©β€–eβ„“+1β€–2eβ„“+1+β‹―+⟨v,enβŸ©β€–enβ€–2en⏟projUβŠ₯v.

Note that we don't actually need to know the extra vectors in BV that form BUβŠ₯ in order to compute either projUv or projUβŠ₯v. All we need is the orthogonal basis BU (computed using the Gram-Schmidt orthogonalization process, if necessary), and then

projUv=⟨v,e1βŸ©β€–e1β€–2e1+β‹―+⟨v,eβ„“βŸ©β€–eβ„“β€–2eβ„“,
projUβŠ₯v=vβˆ’projUv.

See Example 38.3.1 in Subsection 38.3.1 for an example.

Orthogonal projection onto a vector in Rn.

In Chapter 14, we defined the orthogonal projection of one vector v onto another a in Rn as

projav=uβ‹…aβ€–aβ€–2a.

This is consistent with our current definition of orthogonal projection, in the case of a one-dimensional subspace U of Rn, and the standard inner product on Rn. In that case, every basis

BU={a}

for U can be considered an orthogonal basis, and the formula

projUv=⟨v,aβŸ©β€–aβ€–2a

agrees with our previous formula for projav.

Subsection 38.1.2 Gram-Schmidt process versus orthogonal projection

The steps in the Gram-Schmidt orthogonalization process create a sequence of orthogonal bases

BU1,BU2,…,BUn

for a nested sequence of subspaces

U1βŠ†U2βŠ†β‹―βŠ†Un=V,

where dimUj=j for each index j. To enlarge basis BUj to basis BUj+1, the process appends the vector

ej+1=vj+1βˆ’[⟨vj+1,e1βŸ©β€–e1β€–2e1+β‹―+⟨vj+1,ejβŸ©β€–ejβ€–2ej],

where

BUj={e1,…,ej}

is the collection of orthogonal vectors created to that point in the process, and vj+1 is the (j+1)th vector in the initial basis used as the starting point in the process.

Comparing the expression in the brackets being subtracted from vj+1 with formulas (⋆) and (⋆⋆), we see that

ej+1=vj+1βˆ’projUjvj+1=projUβŠ₯jvj+1.
Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)
Figure 38.1.1. Visualization of the result of the third step of the Gram-Schmidt orthogonalization process as an orthogonal projection.

Subsection 38.1.3 Best approximation and distance between vector and subspace

As in Rn, we can measure distance between vectors by measuring the norm of a difference vector:

dist(u,v)=β€–uβˆ’vβ€–.
Diagram illustrating distance in an inner product space.
Figure 38.1.2. Visualization of the distance between vectors in an inner product space.

We originally developed the concept of orthogonal projection in Chapter 14 to help us answer an approximation question (Question 14.3.2), and it will do the same in an abstract inner product space. If U is a subspace of a finite-dimensional inner product space, and v is a vector not in U, then amongst all vectors u in U the distance

dist(v,u)=β€–vβˆ’uβ€–

will be minimized at

u=projUv.

(See Theorem 38.4.4.)

For this reason, the vector projUv is called the best approximation to v in U.

Diagram illustrating an orthogonal projection in \(\R^3\) as a best approximation.

Because of this, orthogonal projection can be used to solve approximation and optimization problems. See Example 38.3.2 in Subsection 38.3.2 for an example.

And since there is one unique smallest distance dist(v,u) amongst all u in U, achieved when u=projUv, we can define this as the distance between v and U:

dist(v,U)=dist(v,projUv)=β€–vβˆ’projUvβ€–=β€–projUβŠ₯vβ€–.

Suppose B={e1,e2,…,en} is an orthonormal basis for our inner product space. (However, we do not assume here that the first so many vectors in this basis forms a basis for the subspace U.) We can then express the vector \uvec{v} and any vector \uvec{u} in U as a coordinate vector relative to this basis:

\begin{align*} \rmatrixOf{\uvec{v}}{B} \amp = (v_1,v_2,\dotsc,v_n) \text{,} \amp \rmatrixOf{\uvec{u}}{B} \amp = (u_1,u_2,\dotsc,u_n) \text{.} \end{align*}

Applying Statement 2 of Proposition 37.5.6, we have

\begin{equation*} \bbrac{\dist (\uvec{v},\uvec{u})}^2 = {\norm{\uvec{v} - \uvec{u}}}^2 = (v_1 - u_1)^2 + (v_2 - u_2)^2 + \dotsb + (v_n - u_n)^2 \text{.} \end{equation*}

Since we are talking about non-negative quantities, minimizing \dist (\uvec{v},\uvec{u}) is the same as minimizing the square of that distance, and so we can say that setting \uvec{u} = \proj_U \uvec{v} minimizes the sum-of-squares expression on the right above. We could interpret this sum-of-squares formula as a measure of the β€œerror” when we β€œapproximate” \uvec{v} by a vector \uvec{u} in U\text{,} and so \uvec{u} = \proj_U \uvec{v} could be called the least squares approximation to \uvec{v} within U\text{.}

Subsection 38.1.4 Least squares solutions to a linear system

Here is a particular kind of linear approximation problem that can be solved with the help of orthogonal projection.

Question 38.1.3.

Suppose A is an m \times n coefficient matrix and \uvec{b}_0 is a column vector in \R^m so that the linear system A \uvec{x} = \uvec{b}_0 is inconsistent.

What vector \uvec{x}_0 in \R^n comes closest to being a solution?

In other words, for what vector \uvec{x}_0 in \R^n will A \uvec{x}_0 be as close as possible to \uvec{b}_0\text{?}

We can pursue this question by β€œpushing it forward” through multiplication by A\text{,} so that it is a question about vectors in \R^m\text{,} where \uvec{b}_0 and each A \uvec{x} live, instead of a question about vectors in \R^n\text{,} where each \uvec{x} lives (and where the answer \uvec{x}_0 lives, if it exists).

We have seen before that a matrix-times-column product can be expanded as a linear combination of the columns of the matrix:

\begin{equation*} A \uvec{x} = x_1 \uvec{a}_1 + x_2 \uvec{a}_2 + \dotsb + x_n \uvec{a}_n \text{,} \end{equation*}

where the \uvec{a}_j are the columns of A\text{.} (For example, see Subsection 22.3.2.) It is this pattern that led us to the definition of column space of a matrix as the span of the columns of the matrix. Furthermore, we have seen that system A \uvec{x} = \uvec{b} is consistent precisely when the vector of constants \uvec{b} lies in the column space of A\text{.} (See Subsection 21.3.1.) Turning the above matrix-times-column pattern around, we also see that the column space of A is made up of all possible products A \uvec{x} for \uvec{x} in \R^n\text{.} (But note that it is possible for different vectors \uvec{x} in \R^n to produce the same vector \uvec{b} in the column space of A\text{,} since we know that a consistent system A \uvec{x} = \uvec{b} could have an infinite number of solutions.)

We have assumed that our system is inconsistent, so that \uvec{b}_0 must not lie in the column space of A\text{,} whereas every result A \uvec{x} lies in the column space of A\text{.} We can now re-frame Question 38.1.3 in a way that will allow us to apply orthogonal projection: what vector(s) \uvec{x} will produce an β€œoutput” vector A \uvec{x} in the column space of A that is closest to \uvec{b}_0\text{?} The answer is: any vector \uvec{x}_0 so that

\begin{gather} A \uvec{x}_0 = \proj_U \uvec{b}_0 \text{,}\label{equation-projection-least-sq-sol}\tag{\(\dagger\)} \end{gather}

where U is the column space of A in \R^m\text{.}

Diagram illustrating an orthogonal projection in \(\R^3\text{.}\)

One way to solve this problem is to use Procedure 21.3.2 to determine a basis for the column space of matrix A, apply the Gram-Schmidt orthogonalization procedure to produce an orthogonal basis for the column space, and then use that basis to compute \proj_U \uvec{b}_0 (where U represents the column space of A). Finally, we could solve the system A \uvec{x} = \proj_U \uvec{b}_0\text{,} which must be consistent because \proj_U \uvec{b}_0 will lie in U\text{,} the column space of A\text{.}

But by applying some theory, we can develop a more direct procedure. Take pairing \inprod{\blank}{\blank} to mean the standard inner product on \R^m (or, later, on \R^n). Now, the vector

\begin{equation*} \uvec{b}_0' = \proj_{\orthogcmp{U}} \uvec{b}_0 = \uvec{b}_0 - \proj_{U} \uvec{b}_0 \end{equation*}

lies in \orthogcmp{U}\text{,} and so is orthogonal to every vector in U\text{,} the column space of A\text{.} So for every \uvec{x} in \R^n\text{,} we have

\begin{align*} 0 \amp = \inprod{\uvec{b}_0'}{A \uvec{x}}\\ \amp = \utrans{(A \uvec{x})} (\uvec{b}_0') \\ \amp = \utrans{\uvec{x}} (\utrans{A} \uvec{b}_0') \\ \amp = \inprod{\utrans{A} \uvec{b}_0'}{\uvec{x}} \text{.} \end{align*}

As the above calculation holds for every \uvec{x} in \R^n\text{,} the result says that \utrans{A} \uvec{b}_0' lies in \orthogcmp{(\R^n)}\text{.} However, in an inner product space, only the zero vector is orthogonal to every vector (Statement 2 of Proposition 37.5.15). Hence

\begin{equation*} \zerovec = \utrans{A} \uvec{b}_0' = \utrans{A} (\uvec{b}_0 - \proj_{U} \uvec{b}_0) = \utrans{A} (\uvec{b}_0 - A \uvec{x}_0) \end{equation*}

(using (\dagger)). Re-arranging, we have

\begin{equation*} \utrans{A} A \uvec{x}_0 = \utrans{A} \uvec{b}_0 \text{.} \end{equation*}

In other words, the special \uvec{x}_0 that we are looking for is a solution to

\begin{equation*} \utrans{A} A \uvec{x} = \utrans{A} \uvec{b}_0 \text{,} \end{equation*}

called the normal system associated to A \uvec{x} = \uvec{b}_0. Solutions to this system are referred to as least-squares solutions for the original inconsistent system A \uvec{x} = \uvec{b}_0\text{.}

Because \proj_{U} \uvec{b}_0 lies in the column space of A\text{,} there is always some \uvec{x}_0 that satisfies (\dagger), hence there is always a solution to the normal system. If A is square and invertible, then so is \utrans{A}A\text{,} and in this case there is one unique solution

\begin{gather} \uvec{x}_0 = \inv{(\utrans{A} A)} \utrans{A} \uvec{b}_0\label{equation-projection-least-sq-pseudo-inverse-sol}\tag{\(\dagger\dagger\)} \end{gather}

But even if A is not square, the coefficient matrix of the normal system, \utrans{A} A\text{,} is always square: we have assumed A is m \times n\text{,} so \utrans{A} A is n \times n\text{.} If this matrix is invertible, then again there is one unique solution as above. In this case,

\begin{equation*} \inv{(\utrans{A} A)} \utrans{A} \end{equation*}

is called the pseudo-inverse of A\text{,} as an analogy between (\dagger\dagger) with the fact that a square, invertible coefficient matrix affords solution

\begin{equation*} A \uvec{x} = \uvec{b} \quad\implies\quad \uvec{x} = \inv{A} \uvec{b} \text{.} \end{equation*}

But recall that in this analysis, we have not assumed that A is square, hence the pseudo in pseudo-inverse.

See Example 38.3.3 in Subsection 38.3.3 for an example of computing a least-squares solution.