Skip to main content
Logo image

Chapter 21 Polynomial approximations

Section 21.1 Linear approximations

Subsection 21.1.1 Linearization at a point

In Section 11.6, we interpreted a derivative value as describing the slope of the local linearity of a graph near a specific point. In other words, on a small domain a<t<b containing some point of interest t=t, the graph of a differentiable function f(t) is approximately linear with slope m=f(t). A linear formula is much easier to use in computations, so if we do not mind introducing a small amount of error we may replace the function f(t) in our computations with a linear function on domain a<t<b.
Example 21.1.1. Approximate square roots.
What is the approximate value of 100.03? Using the known “nearby” value of 100=10 as a “base point” for our calculations, we will replace the square root function with a linear function L(t) that also has L(100)=10, and has the same slope as the graph of the square root function at the point t=100:
d(t)dt|t=100=12t|t=100=120.
So the graph of L(t) must pass through the point (100,10) and must create a slope of 1/20 with every other point (t,y) on its graph:
y10t100=120y10=t10020y=10+t10020.
Our approximating linear function to the square root function near the point t=100 is
L(t)=10+t10020,
and we approximate
tL(t)fort100.
In particular, we wished to approximate
100.03L(100.03)=10+100.0310020=10+0.03210=10+0.01510=10+0.0015=10.0015.
(Compare with the result for 100.03 returned by your calculator.)
Remark 21.1.2.
It may seem silly to go to all the trouble of Example 21.1.1 instead of entering 100.03 into a calculator, but if you have billions of similar calculations to do, programming a computer to use a linear formula will save a significant amount of time compared to repeatedly using the sqrt function in your computing language’s math libraries.
We can obtain a general formula for the equation of an approximating linear function in the same manner as in Example 21.1.1: we would like a linear function L(t) that has “inital value” L(t)=f(t) and slope m=f(t) to replace f(t) near t=t.
Definition 21.1.3. Linearization of a function.
If f(t) is defined on domain a<t<b and differentiable at t=t within that domain, then the linearization of f(t) based at t=t is the linear function
L(t)=f(t)+f(t)(tt),
restricted to the domain a<t<b.
Remark 21.1.4.
  • A linearization of a function is nothing more than a tangent line, but we are introducing new terminology to emphasize its use a linear “version” of f(t) to be used as a replacement for f(t) in computations.
  • The only variable in the formula for L(t) is the independent variable t; the three expressions t, f(t), and f(t) should be specific numbers obtained by choosing an appropriate value of t and calculating the other two values from the function f(t).
Example 21.1.5. Linearizing the natural logarithm.
A known value of the natural logarithm is ln1=0, so we can use the linearization at t=1 to approximate lnt for t1. First compute
d(lnt)dt|t=1=1t|t=1=1.
From this and ln1=0 we have
L(t)=0+1(t1)=t1.
If we have many values of ln(1+Δt) to compute, all of them with Δt0, then really there’s no need to perform expensive logarithm computations at all, as we would have
ln(1+Δt)L(1+Δt)=(1+Δt)1=Δt.
For example,
ln(1.001)0.001,ln(0.999)0.001.
Example 21.1.6. Linearizing sine at the origin.
The slope of the graph of sin(t) at t=0 is 1, since
d(sin(t))dt|t=0=cos(0)=1.
So the linearization of sine based at t=0 is
L(t)=sin(0)+1(t0)=t.
This recovers the familiar approximation
sin(t)tfort0.

Subsection 21.1.2 Error in a linear approximation

Merely stating ln(1.001)0.001 is all well and good, but it is fairly useless as an approximation if you have no idea how good it is as an approximation. This “approximate” value could be way off, for all we know. We could compare this approximate value to the true value, but the whole point of approximating is to avoid a more complicated computation. If we able to easily calcluate ln(1.001) exactly, why not just do that instead of approximating it?
However, it is possible to estimate the error involved in a linear approximation.
Example 21.1.8. Adding error bounds to an approximation.
Let’s contextualize the approximation of ln(1.001) that we made in Example 21.1.5. We have
|d2(lnt)dt2|=|1t2|=1t2,
and on the closed interval 1t1.001 this second derivative magnitude achieves its largest value at t=1 so that
|d2(lnt)dt2|112=1.
Therefore, the error in our approximation ln(1.001)0.001 satisfies
|E|12|1.0011|2=5×107.
In other words, we can confidently say that
0.0009995ln(1.001)0.0010005.
Remark 21.1.9.
A linear approximation will quickly become a bad approximation if we stray too far from the base point.
Comparing the graphs of a function and its linear approximation.
Figure 21.1.10. The approximation f(t)L(t) will have a large error if t is too far from the base point t.

Subsection 21.1.3 Differentials

Subsubsection Another look at linearization
Here is another perspective on linear approximations. Recall that, from a base point (t,q(t)) on the graph of q(t), a second point (t,q(t)) creates a secant line. The actual variation in quantity from base time to the second time is
Δq=q(t)q(t),
and this variation occurs over a time period of duration
Δt=tt.
Over the same duration, we can contrast that the variation in the linear approximation, which we will label as dq instead of Δt as it follows the tangent line instead of a secant line.
A graph of a secant line between two points on a quantity graph.
(a) Variation along a secant line between two points on the quantity graph.
A graph of a tangent line as a linear approximation to a quantity graph.
(b) Variation along a tangent line to a point on the quantity graph.
Figure 21.1.11. Comparing variation along a secant line versus along a tangent line.
Mixing differential and derivative-function notation, and abusing notation to represent slope and variation along the tangent (linear approximation) line, we have
dqdt=q(t)dq=q(t)dt.
Now, the duration is the same whether we are looking at variation of the original function or variation of the linear approximation, so
Δt=dt.
But a linear approximation is only an approximation, so we have merely Δqdq. So while we can write
q(t)=q(t+Δt)=q(t)+Δq
exactly, we can only approximate
q(t)=q(t+Δt)=q(t+dt)q(t)+dq.
(But approximating is often what we choose to do if computing exactly is too computationally “expensive”.)
Using dq=q(t)dt, we may rewrite the linearization formula in Definition 21.1.3 in differential notation.
Example 21.1.13. Estimating a square root.
Let’s repeat Example 21.1.1 using this new notation. We would like to estimate 100.03, which involves the square root process q(t)=t. We know q(100)=10, so t=100 is a convenient “nearby” base point. The difference between this base point and the location t=100.03 of the desired output value is
dt=Δt=0.03.
We also have derivative function q(t)=1/2t, and its value at the base point is q(100)=1/20. So now we are ready to estimate
100.03=q(100.03)=q(100+0.03)q(100)+q(100)0.0310+0.03/2010.0015.
Example 21.1.14. Estimating a cosine value.
Let’s use d(cos(t))/dt=sin(t) to estimate the value of cos(40°). A convenient “nearby” base point could be 45°, where we know that
cos(45°)=sin(45°)=12.
However, it would be incorrect to use 5° as the value of dt, as we should always calculate in radians. Now, 45° corresponds to π/4 radians, and each degree is π/180 radians, so we should use
dt=5π180=π36.
Now we are ready to approximate
cos(40°)=cos(π4π36)cos(π4)cos(π4)π36=cos(π4)(sin(π4))π36=12+12π36=12(1+π36).
If we just pull out our calculator now to compute the above, that invalidates all our work in avoiding pulling out our calculator to compute cos(40°) in the first place. So let’s further approximate
21.414,π3.14159.
With these approximations we have
cos(40°)0.7689.
Subsubsection Relative error in a linear approximation
We expect that the error in the approximation Δqdq should be small if dt is kept small.
Justification.
The variation in q from base point t to “second” point t is
Δq=q(t)q(t)=q(t+dt)q(t),
where
dt=Δt=tt.
So the error in approximating Δqdq is
dqΔq=q(t)dt[q(t+dt)q(t)].
The error relative to dt is then
dqΔqdt=q(t)dt[q(t+dt)q(t)]dt=q(t)q(t+dt)q(t)dt
But the quotient at the end is precisely the difference quotient that approximates the value of q(t). That is, for dt0 we have
q(t+dt)q(t)dtq(t),
and this approximation can be made arbitrarily good by taking dt to be sufficiently small. Thus
dqΔqdtq(t)q(t)=0,
and again this approximation can be made arbitrarily good by taking dt to be sufficiently small.
Subsubsection Approximating relative variation
Often what we interested in is the actual amount increase/decrease Δq in the quantity rather than the final amount q(t), in which case we can simply use
Δqdq=q(t)dt.
Moreover, often we are interested in knowing the increase/decrease not as simply a difference but as a relative difference.
Definition 21.1.16. Relative variation.
The relative variation in a quantity is the ratio of the variation Δq to the initial quantity q(t0):
Δqq(t0)=q(t)q(t0)q(t0).
In the context of linear approximations we will usually take the “initial” time to be our base point t=t, and then we may approximate the relative variation as
Δqq(t)dqq(t)=q(t)q(t)dt.
Example 21.1.17. Relative increase in area of a circle.
You are designing a circular part for a machine. During manufacturing the actual diameter of the completed part can vary by up to ±2%. What range of variation in the material used to make the part is possible?
If the part is a uniform thickness, we can use the area of its circular face as a proxy measurement for the amount of material used. Let’s work with the area as a function of the radius: A(r)=πr2. A ±2% variation in the diameter also means a ±2% variation in the radius, because diameter and radius are proportional. So we have
dr=Δr=(±2%)×r=±0.02r.
The variation in the area can be approximated via differentials:
ΔAdA=dAdrdr=2πrdr.
Then relative variation is the ratio
ΔAAdAA=2πrdrπr2=2drr.
Using dr=±0.02r we have
ΔAA2(±0.02r)r=±0.04.
So a ±2% variation in the diameter corresponds to an approximate ±4% variation in the area.

Section 21.2 Quadratic approximations

The linear approximation sin(t)t is fine for t0, but it quickly becomes inaccurate as the sine graph curves away from the approximating line. A linearization only matches the originating function in two aspects:
  • Same base value, L(t)=f(t).
  • Same slope, L(t)=f(t).
If we want a better approximating function that stays close to the values of the original function over a wider interval around the base point, we need it to better match the curvature of the original function by having the two functions match in a higher-order aspect:
  • Same concavity, L(t)=f(t).
In general, this will not work when L(t) is linear since a line has no concavity. Instead, we will need move up a degree to a quadratic approximating function.
For now, let’s work with base point t=0. We are looking for a quadratic polynomial
p(t)=a0+a1t+a2t2
that matches the above three aspects of the function f(t) at the base point.
p f
Base value a0 f(0)
First derivative a1 f(0)
Second derivative 2a2 f(0)
To make these match in all three rows we should use
a0=f(0),a1=f(0),a2=f(0)2.

Example 21.2.2. Quadratic approximation of cos(t) near t=0.

The linear approximation
cos(t)1fort0
quickly becomes a poor approximation as the graph of cosine curves downward, away from its “initial” value cos(0)=1
Comparing the graph of the cosine function to the horizontal line at 1.
Figure 21.2.3. Comparing cosine to 1.
We can use a quadratic approximation to do better. Calculate:
cos(0)=1,d(cos(t))dt|t=0=sin(0)d2(cos(t))dt2|t=0=cos(0)=0,=1.
From this we obtain quadratic approximation
p(t)=1t22.
Notice that this parabola is concave down, just as the cosine graph is around t=0.
Comparing the graph of the cosine function to specially created parabola.
Figure 21.2.4. Comparing cosine to a quadratic approximation.
Working at a different base point t=t is effectively just a horizontal shift of the above pattern.

Example 21.2.6. Quadratic approximation of ln(t) near t=1.

Calculate:
ln(1)=0,d(ln(t))dt|t=1=11d2(ln(t))dt2|t=1=112=1,=1.
From this we obtain quadratic approximation
p(t)=(t1)12(t1)2.
A graph of the natural logarithm and its quadratic approximation at base point 1.
Figure 21.2.7. The quadratic approximation of the natural logarithm at t=1.
We can use this quadratic polynomial to improve on the specific approximations we made in Example 21.1.5:
ln(1.001)0.0010.00122ln(0.999)0.001(0.001)22=0.0009995,=0.0010005.
The approximate error in a linear approximation is related to the magnitude of the second derivative near the base point, as a linear approximation omits concavity information. In the same way, the approximate error in a quadratic linear approximation is related to the magnitude of the third derivative near the base point.

Example 21.2.9. Error in our quadratic approximation of the natural logarithm.

We would like to know something about the quality of our approximation ln(1.001)0.0009995 from Example 21.2.6. Arguing similarly to Example 21.1.8, near t=1 we have
|d3(lnt)dt3|=2t3
and can say
|d3(lnt)dt3|213=2
on the interval 1t1.001. Therefore, the error in our approximation ln(1.001)0.0009995 satisfies
|E|26|1.0011|3=13×109.
Rounding up so that we can ensure our error estimate is large enough, we can say
|E|3.34×1010,
which implies that
0.000999499666ln(1.001)0.000999500334.
However, in our linear approximation error estimate, we found 0.0009995ln(1.001), so we could sharpen the above error estimate to
0.0009995ln(1.001)0.000999500334.

Remark 21.2.10.

We will unify the patterns of this section with the previous section into a general pattern of approximating a function by a polynomial, and the associated approximate error.

Section 21.3 Higher-degree approximations: Maclaurin and Taylor polynomials

Subsection 21.3.1 Higher derivatives

In Section 18.3 we introduce the second derivative of a function. For a quantity q(t), the derivative function q(t) represents its rate of variation. But if we view that new derivative function as the “quantity” of variation, then we could ask how that quantity varies, and the answer would be the derivative of that derivative function. Being two derivative calculations removed from the original function q(t), we refer to the derivative of q(t) as the second derivative of q, and write
q(t)ord2qdt2
to represent it.
But we can repeat the process that led us to the concept of second derivative. Viewing the second derivative as the “quantity” of variation of the variation of q, we could ask how that quantity varies, and the answer would be the derivative of that second derivative function. Being three derivative calculations removed from the original function q(t), we refer to the derivative of q(t) as the third derivative of q, and write
q(t)ord3qdt3
to represent it. And so on, creating the higher derivatives of the function q(t).
first derivative:q(t)ordqdt= derivative of q(t)second derivative:q(t)ord2qdt2= derivative of q(t)third derivative:q(t)ord3qdt3= derivative of q(t)
Eventually the number of tick marks becomes unreasonable, and we switch to a superscript, but in brackets to distinguish it from an exponent.
fourth derivative:q(4)(t)ord4qdt4= derivative of q(t)fifth derivative:q(5)(t)ord5qdt5= derivative of q(4)(t)
Example 21.3.1. Higher derivatives of the natural logarithm function.
We have d(ln(t))/dt=1/t, and after that we may use the pattern for the Derivative of a power function to compute higher derivatives:
ddt(ln(t))=1td2dt2(ln(t))=ddt(1t)=1t2d3dt3(ln(t))=ddt(1t2)=2t3d4dt4(ln(t))=ddt(2t3)=6t4d5dt5(ln(t))=ddt(6t4)=24t5
There appears to be a pattern here, where the higher derivatives alternate between positive and negative coefficients, and the numerator grows by a factor equal to the previous power in the denominator.
dndtn(ln(t))=(1)n1(n1)!tn.
(Note that 0!=1.)
Example 21.3.2. Higher derivatives of the natural exponential function.
Because d(et)/dt=et, we have
dn(et)dtn=et
for all n.
Checkpoint 21.3.3. Higher derivatives of trig functions.
Work out the general pattern of the higher derivatives of sin(t) and cos(t).

Subsection 21.3.2 Matching higher derivatives

A linear approximation is created by matching function value and derivative value at the base point. Graphically, the linear approximation is the simplest function that shares the same height and the same slope as the original function at the base point. A quadratic approximation is created by matching both the first and second derivative values at the base point. Graphically, the quadratic approximation is the simplest function that shares the same height, the same slope, and the same rate of variation of the slope (that is, the same concavity) as the original function at the base point. To get more accurate approximations, we should create a higher-degree polynomial that matches even more higher derivative values of the function at the base point. Graphically, this means our approximation function will be the simplest function that shares the same height, the same slope, the same concavity, the same rate of variation of concavity, and so on, at the base point.
We begin with the general form of a degree-n polynomial:
p(t)=a0+a1t+a2t2++antn.
As before, we will first look for that pattern for base point t=0, as it is simpler to compute derivative values of a polynomial at that point. But what is the pattern in the derivative values of a polynomial?
p(t)=a0+a1t+a2t2++aktk++antnp(t)=a1+2a2t+3a3t2++kaktk1++nantn1p(t)=2a2+6a3t+12a4t2++k(k1)aktk2++n(n1)antn2p(t)=6a3+24a4t+60a5t2++k(k1)(k2)aktk3++n(n1)(n2)antn3p(k)(t)=k!ak++n(n1)(n2)(nk+1)antnk.
The factorial pattern occurs because as we apply the power rule repeatedly to each term, we repeatedly multiply the coefficient of that term by the decreasing exponent values. And then when we substitute t=0 into these derivative functions, all but the constant term zeros out:
p(k)(0)=k!ak.
To have p(k)(0)=f(k)(0) for all k, we should set
ak=f(k)(0)k!.
Remark 21.3.5.
Notice that a Maclaurin polynomial is just a Taylor polynomial with t=0:
Mn(t)=Tn,0(t).
Also notice that our previous linear and quadratic approximation formulas are just Maclaurin/Taylor polynomials of degree 1 and 2, respectively. Finally, to be able to consolidate into summation notation, we have adopted the convention that the “zeroth” derivative of a function is just the function itself: f(0)(t)=f(t).
Example 21.3.6. Maclaurin approximation of cosine.
The first four derivative values (beginning at the “zeroth”) of cos(t) at t=0 are
cos(0)=1,d2(cos(t))dt2|t=0=1d(cos(t))dt|t=0=0d3(cos(t))dt3|t=0=0
After this, the higher derivative values of cos(t) repeat these four values. So only the even-degree terms are present in the Maclaurin polynomial:
Mn(t)=k=0k evenn(1)k/2k!tk.
In this case, it is a little cleaner to replace our always-even index variable with the substitution k=2m:
Mn(t)=m=0n/2(1)m(2m)!t2m.
Notice that in both versions, in case that n is odd, then the final term will involve tn1, not tn.
Checkpoint 21.3.7. Maclaurin approximation of sin(t).
Perform a similar analysis as in Example 21.3.6 to create the Maclaurin polynomial Mn(t) for sin(t). You should see a similar alternating pattern to the coefficients.
Checkpoint 21.3.8. Maclaurin approximation of the natural exponential function.
Use the fact that
d(et)dt=et
to create the Maclaurin polynomial Mn(t) for the exponential function exp(t).
Example 21.3.9. Taylor polynomial for ln(t) based at t=1.
From the derivative calculations
d(ln(t))dt=1td3(ln(t))dt3=2t3d2(ln(t))dt2=1t2d4(ln(t))dt4=6t4,
we see a pattern emerging:
dk(ln(t))dtk=(1)k1(k1)!tkfor k1.
(Recall that 0!=1.)
The index k=0 represents the constant term in the Taylor polynomial, which is always the function value at the base point. Substituting the base point t=1 into the derivative pattern above and combining with the constant term, we have
Tn,1(t)=ln(1)+k=1n(1)k1(k1)!k!(t1)k=k=1n(1)k1k(t1)k.
(Recall that ln(1)=0.)
Once again, an approximation is worthless without some idea of its accuracy. And, just as in the linear and quadratic cases, the error is approximately measured the the derivative value that is not being matched by the polynomial.
Example 21.3.11.
What value of n can we use so that cos(1)Mn(1) is accurate to within ±0.001? The derivatives of sin(t) are all sine or cosine or their negatives, so we can say |sin(n+1)(t)|1 for all n and all t. Thus
|E|1(n+1)!|10|n+1=1(n+1)!.
We first have (n+1)!1000 for n=6, so using cos(1)M6(1) will give us the desired accuracy. From the pattern in Example 21.3.6 we have
M6(t)=1t22+t44!t66!,
and so at t=1 we have
M6(t)=112+14!16!=6!6!3606!+306!16!=389720.
Conclude that
cos(1)0.540±0.001.