Section 38.1 Concepts
In this section.
Subsection 38.1.1 Orthogonal projection
In a finite-dimensional inner product space V, a subspace U and its orthogonal complement Uβ₯ form a complete set of independent subspaces (Corollary 37.5.19). So every vector v in V can be decomposed uniquely as into a sum of two vectors, one in U and one in Uβ₯ (Proposition 28.6.8):
v=u+uβ²,
with u in U and uβ² in Uβ₯.
projUv
to mean this vector. Note that by the symmetry of orthogonal complements (Corollary 37.5.20), we could also say that the complementary vector uβ² in Uβ₯ is
projUβ₯v,
the orthogonal projection of v onto Uβ₯.
As a pair, the vectors u and uβ² are sometimes called the component of v parallel to U and the component of v orthogonal to U, respectively.
The Expansion theorem, combined with Proposition 37.5.18, tells us how to compute orthogonal projections. An orthogonal basis
BU={e1,β¦,eβ}
for subspace U can be enlarged to an orthogonal basis
BV={e1,β¦,eβ,eβ+1,β¦,en}
for the whole inner product space V (Corollary 37.5.10). But the extra vectors in the enlarged basis then form a basis for Uβ₯:
BUβ₯={eβ+1,β¦,en}.
So if we expand v relative to BV, we will simultaneously obtain expansions for the components parallel and orthogonal to U relative to BU and BUβ₯, respectively:
v=β¨v,e1β©βe1β2e1+β―+β¨v,eββ©βeββ2eββprojUv+β¨v,eβ+1β©βeβ+1β2eβ+1+β―+β¨v,enβ©βenβ2enβprojUβ₯v.
Note that we don't actually need to know the extra vectors in BV that form BUβ₯ in order to compute either projUv or projUβ₯v. All we need is the orthogonal basis BU (computed using the Gram-Schmidt orthogonalization process, if necessary), and then
projUv=β¨v,e1β©βe1β2e1+β―+β¨v,eββ©βeββ2eβ,
projUβ₯v=vβprojUv.
See Example 38.3.1 in Subsection 38.3.1 for an example.
Orthogonal projection onto a vector in Rn.
In Chapter 14, we defined the orthogonal projection of one vector v onto another a in Rn as
projav=uβ
aβaβ2a.
This is consistent with our current definition of orthogonal projection, in the case of a one-dimensional subspace U of Rn, and the standard inner product on Rn. In that case, every basis
BU={a}
for U can be considered an orthogonal basis, and the formula
projUv=β¨v,aβ©βaβ2a
agrees with our previous formula for projav.Subsection 38.1.2 Gram-Schmidt process versus orthogonal projection
The steps in the Gram-Schmidt orthogonalization process create a sequence of orthogonal bases
BU1,BU2,β¦,BUn
for a nested sequence of subspaces
U1βU2ββ―βUn=V,
where dimUj=j for each index j. To enlarge basis BUj to basis BUj+1, the process appends the vector
ej+1=vj+1β[β¨vj+1,e1β©βe1β2e1+β―+β¨vj+1,ejβ©βejβ2ej],
where
BUj={e1,β¦,ej}
is the collection of orthogonal vectors created to that point in the process, and vj+1 is the (j+1)th vector in the initial basis used as the starting point in the process.
Comparing the expression in the brackets being subtracted from vj+1 with formulas (β) and (ββ), we see that
ej+1=vj+1βprojUjvj+1=projUβ₯jvj+1.
Subsection 38.1.3 Best approximation and distance between vector and subspace
As in Rn, we can measure distance between vectors by measuring the norm of a difference vector:
dist(u,v)=βuβvβ.
dist(v,u)=βvβuβ
will be minimized at
u=projUv.
(See Theorem 38.4.4.)
For this reason, the vector projUv is called the best approximation to v in U.
dist(v,U)=dist(v,projUv)=βvβprojUvβ=βprojUβ₯vβ.
Suppose B={e1,e2,β¦,en} is an orthonormal basis for our inner product space. (However, we do not assume here that the first so many vectors in this basis forms a basis for the subspace U.) We can then express the vector \uvec{v} and any vector \uvec{u} in U as a coordinate vector relative to this basis:
\begin{align*}
\rmatrixOf{\uvec{v}}{B} \amp = (v_1,v_2,\dotsc,v_n) \text{,} \amp
\rmatrixOf{\uvec{u}}{B} \amp = (u_1,u_2,\dotsc,u_n) \text{.}
\end{align*}
Applying Statement 2 of Proposition 37.5.6, we have
\begin{equation*}
\bbrac{\dist (\uvec{v},\uvec{u})}^2 = {\norm{\uvec{v} - \uvec{u}}}^2 = (v_1 - u_1)^2 + (v_2 - u_2)^2 + \dotsb + (v_n - u_n)^2 \text{.}
\end{equation*}
Since we are talking about non-negative quantities, minimizing \dist (\uvec{v},\uvec{u}) is the same as minimizing the square of that distance, and so we can say that setting \uvec{u} = \proj_U \uvec{v} minimizes the sum-of-squares expression on the right above. We could interpret this sum-of-squares formula as a measure of the βerrorβ when we βapproximateβ \uvec{v} by a vector \uvec{u} in U\text{,} and so \uvec{u} = \proj_U \uvec{v} could be called the least squares approximation to \uvec{v} within U\text{.}Subsection 38.1.4 Least squares solutions to a linear system
Here is a particular kind of linear approximation problem that can be solved with the help of orthogonal projection.Question 38.1.3.
Suppose A is an m \times n coefficient matrix and \uvec{b}_0 is a column vector in \R^m so that the linear system A \uvec{x} = \uvec{b}_0 is inconsistent.
What vector \uvec{x}_0 in \R^n comes closest to being a solution?
In other words, for what vector \uvec{x}_0 in \R^n will A \uvec{x}_0 be as close as possible to \uvec{b}_0\text{?}
\begin{equation*}
A \uvec{x} = x_1 \uvec{a}_1 + x_2 \uvec{a}_2 + \dotsb + x_n \uvec{a}_n \text{,}
\end{equation*}
where the \uvec{a}_j are the columns of A\text{.} (For example, see Subsection 22.3.2.) It is this pattern that led us to the definition of column space of a matrix as the span of the columns of the matrix. Furthermore, we have seen that system A \uvec{x} = \uvec{b} is consistent precisely when the vector of constants \uvec{b} lies in the column space of A\text{.} (See Subsection 21.3.1.) Turning the above matrix-times-column pattern around, we also see that the column space of A is made up of all possible products A \uvec{x} for \uvec{x} in \R^n\text{.} (But note that it is possible for different vectors \uvec{x} in \R^n to produce the same vector \uvec{b} in the column space of A\text{,} since we know that a consistent system A \uvec{x} = \uvec{b} could have an infinite number of solutions.)
We have assumed that our system is inconsistent, so that \uvec{b}_0 must not lie in the column space of A\text{,} whereas every result A \uvec{x} lies in the column space of A\text{.} We can now re-frame Question 38.1.3 in a way that will allow us to apply orthogonal projection: what vector(s) \uvec{x} will produce an βoutputβ vector A \uvec{x} in the column space of A that is closest to \uvec{b}_0\text{?} The answer is: any vector \uvec{x}_0 so that
\begin{gather}
A \uvec{x}_0 = \proj_U \uvec{b}_0 \text{,}\label{equation-projection-least-sq-sol}\tag{\(\dagger\)}
\end{gather}
where U is the column space of A in \R^m\text{.}
\begin{equation*}
\uvec{b}_0' = \proj_{\orthogcmp{U}} \uvec{b}_0 = \uvec{b}_0 - \proj_{U} \uvec{b}_0
\end{equation*}
lies in \orthogcmp{U}\text{,} and so is orthogonal to every vector in U\text{,} the column space of A\text{.} So for every \uvec{x} in \R^n\text{,} we have
\begin{align*}
0 \amp = \inprod{\uvec{b}_0'}{A \uvec{x}}\\
\amp = \utrans{(A \uvec{x})} (\uvec{b}_0') \\
\amp = \utrans{\uvec{x}} (\utrans{A} \uvec{b}_0') \\
\amp = \inprod{\utrans{A} \uvec{b}_0'}{\uvec{x}} \text{.}
\end{align*}
As the above calculation holds for every \uvec{x} in \R^n\text{,} the result says that \utrans{A} \uvec{b}_0' lies in \orthogcmp{(\R^n)}\text{.} However, in an inner product space, only the zero vector is orthogonal to every vector (Statement 2 of Proposition 37.5.15). Hence
\begin{equation*}
\zerovec = \utrans{A} \uvec{b}_0'
= \utrans{A} (\uvec{b}_0 - \proj_{U} \uvec{b}_0)
= \utrans{A} (\uvec{b}_0 - A \uvec{x}_0)
\end{equation*}
(using (\dagger)). Re-arranging, we have
\begin{equation*}
\utrans{A} A \uvec{x}_0 = \utrans{A} \uvec{b}_0 \text{.}
\end{equation*}
In other words, the special \uvec{x}_0 that we are looking for is a solution to
\begin{equation*}
\utrans{A} A \uvec{x} = \utrans{A} \uvec{b}_0 \text{,}
\end{equation*}
called the normal system associated to A \uvec{x} = \uvec{b}_0. Solutions to this system are referred to as least-squares solutions for the original inconsistent system A \uvec{x} = \uvec{b}_0\text{.}
Because \proj_{U} \uvec{b}_0 lies in the column space of A\text{,} there is always some \uvec{x}_0 that satisfies (\dagger), hence there is always a solution to the normal system. If A is square and invertible, then so is \utrans{A}A\text{,} and in this case there is one unique solution
\begin{gather}
\uvec{x}_0 = \inv{(\utrans{A} A)} \utrans{A} \uvec{b}_0\label{equation-projection-least-sq-pseudo-inverse-sol}\tag{\(\dagger\dagger\)}
\end{gather}
But even if A is not square, the coefficient matrix of the normal system, \utrans{A} A\text{,} is always square: we have assumed A is m \times n\text{,} so \utrans{A} A is n \times n\text{.} If this matrix is invertible, then again there is one unique solution as above. In this case,
\begin{equation*}
\inv{(\utrans{A} A)} \utrans{A}
\end{equation*}
is called the pseudo-inverse of A\text{,} as an analogy between (\dagger\dagger) with the fact that a square, invertible coefficient matrix affords solution
\begin{equation*}
A \uvec{x} = \uvec{b} \quad\implies\quad \uvec{x} = \inv{A} \uvec{b} \text{.}
\end{equation*}
But recall that in this analysis, we have not assumed that A is square, hence the pseudo in pseudo-inverse.
See Example 38.3.3 in Subsection 38.3.3 for an example of computing a least-squares solution.