14.3 Why Do these Rules Work?

Suppose your integrand is f(x)f(x) and ff has a power series expansion about the point qq whose coefficients do not go wild. We can then write, for xx near qq

f(x)=f(q)+(xq)f(q)+(xq)2f(2)(q)2+(xq)3f(3)(q)3!+(xq)4f(4)(q)4!+f(x) = f(q) + (x-q)f'(q) + (x-q)^2\frac{f^{(2)}(q)}{2} + (x-q)^3\frac{f^{(3)}(q)}{3!} + (x-q)^4\frac{f^{(4)}(q)}{4!} + \ldots

where f(k)(q)f^{(k)}(q) is the kthk^{th} derivative of f(x)f(x) evaluated at x=qx = q.

We want to find the area under f(x)f(x) when xx ranges from qdq-d to q+dq+d, an interval of length 2d2d.

Notice that if we form f(q+d)+f(qd)f(q+d) + f(q-d) the contributions of f(q)f'(q) which is the second term on the right here, will cancel out. In fact all the terms involving odd derivatives will cancel out:

f(q+d)+f(qd)=2f(q)+2d2f(2)(q)2!+terms coming from higher derivatives of ff(q+d) + f(q-d) = 2f(q) + 2d^2\frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of } f

We can conclude, by integrating both sides of this equation:

qdq+df(x)dx=dddxf(q)+ddy2dyf(2)(q)2!+terms coming from higher derivatives of f\int_{q-d}^{q+d} f(x)dx = \int_{-d}^{d} dxf(q) + \int_{-d}^d y^2dy \frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of }f

The first term on the right is 2df(q)2df(q) and the next is 2d3f(2)(q)3!2d^3\frac{f^{(2)}(q)}{3!}, and the remaining terms are proportional to higher powers of dd.

The first term of 2df(q)2df(q), by the previous equation differs from d(f(q+d)+f(qd))d(f(q+d) + f(q-d)) by d3f(2)(q)-d^3f^{(2)}(q) and terms of higher power in dd.

The upshot of this argument is that approximating the integral of ff in the interval within dd of qq by d(f(q+d)+f(qd))d(f(q+d)+f(q-d)) produces an error proportional to d3d^3 as well as terms of higher odd powers of dd and no error linear in dd or quadratic in dd. In fact, the error term cubic in dd is 23d3f(2)(q)\frac{2}{3}d^3f^{(2)}(q).

If we decrease dd by a factor of 22, f(q)df(q)d decreases by a factor of 22, but this is compensated for by the fact that there are now twice as many intervals, each of half the size of those present before the decrease. Thus, the contributions of the first terms to the integral as a whole will usually not change much. (The only changes will come from changes in the evaluations of terms like f(d+q)f(d+q) when dd changes.) On the other hand, the contributions of the cubic error terms will decrease by roughly a factor of 88 and, since there will be twice as many intervals, their contribution to the overall error will decrease by roughly a factor of 44.

So?

This means that if we take four times the estimate after the split and subtract the estimate before the split we will (almost) eliminate the first error term, and the resulting next error term will decrease on splitting by a factor of 1616. We get estimates 414-1 or 33 times so we divide by 33 to get back to an estimate of the integral.

So here is our plan:

First we calculate the integral from the starting point to any later point using the left hand rule. This is very easy to do, as we shall see.

How?

We can do this by creating two columns: an xx column and an integral column.

The xx column (say column A) starts, (say in A5) with the lower end of the integral. Then for kk larger than 55 we set Ak to A(k-1) + d.

In the left rule integral column we set Ck to C(k-1) + d*f(Bk).

The left hand rule integral from the content of A5 to that of Ak will be C(k-1).

Actually, when you want to change the function you are integrating, you are better off if all evaluations of the function are in one column, Thus you might want to set Bk to d*f(Ak), and Ck to C(k-1) + Bk.

Next we convert the left hand rule integral in column C to the trapezoid rule integral in column D.

How?

In D we add to Ck a term which gets rid of half of A5*d and half of d*Ak. Explicitly, we set Dk to Ck - (A$5+Ak)*d/2 .

The Trapezoid rule integral from B5 to Bk now appears in Dk.

Next we convert this to Simpson’s Rule in column E.

How?

Lets call the Trapezoid rule result with interval size dd with endpoint start+kd,\text{start} + kd, T(d,k).T(d,k).

What we want to do is to repeat the Trapezoid calculation for interval size 2d2d and then form 4T(d,k)T(2d,k)3.\frac{4T(d,k)-T(2d,k)}{3}. The result will have error that behaves as d4d^4. This estimate is called Simpson's Rule.

Why do we compute Simpson's rule in this way? Because it is easy to apply the left hand rule, easy to get the Trapezoid rule results in a column from it, and easy to double the interval size in another column. Having done so, its easy to form the new rule from the old ones on a spreadsheet according to the expression in the last paragraph above, in a third column.

It is easiest to accomplish this when T(2d,k)T(2d,k) is in the same row as T(d,k)T(d,k). Recall that the contribution to the left hand rule at row kk is df(k)df(k) which is added to the result in the row corresponding to k1k-1. For 2d2d, this contribution is doubled and added to the result for the previous doubled dd row which is 22 above it.

The correction to make the left hand rule into the trapezoid rule for the integral from start ss to s+kds+kd is to subtract half of the first and last contribution from the partial sum that includes the initial value and the sum up s+kds+kd.

It turns out that there is not much difficulty in putting T(2d,2k)T(2d,2k) in the same line as T(d,2k)T(d,2k) which makes it easy to form 4T(d,k)T(d,2k)3\frac{4T(d,k)-T(d,2k)}{3}.

Well, what do you actually do?

Put your interval size dd in some box, say A1.

In column A, put your start point in say A5, and from A6 at each step increase the value by dd. Thus in A6, you can put =A5+A$1, and A6 can be copied down column A as far as you want.

In column B, put the value of the integrand times dd at the corresponding argument: in B5 put =A$1*f(A5), and copy this down column B.

In column C, put the partial sums of column B: which means, in C5 put =C4+B5, and copy down column C.

In column D, put in D5: =-(B5+B$5)/2 and copy this down column D.

The Trapezoid answer from your start (which is in B5) to the value in Bk will be Ck+Dk which you can put in box Ek, by setting E5 to =C5+D5 and copying this down column E.

In F, set F5 =2*B5+F3 and copy down F (This will put the left hand 2d2d rule results in the odd numbered rows of F beyond row 5. The even numbered rows contain useless junk.)

In G5, set =2*D5 +F5 and copy down G, which will give the integral for interval 2d2d trapezoid result in the odd numbered entries of column G.

In H5, set =(4*E5-G5)/3 and copy down. Simpson's rule for the integral from the content of B5 to that of B(2k+1) will then appear in H(2k+1), for kk least 33.

Next we do an explicit example for the function xsin(x)x\sin(x).

Preliminaries:
Set A1 to Integrate f(x), B1 to f(x) = xsin(x)
Set A2 to d, B2 to 0.01
Set A3 to startpoint, B3 to 1

Create Columns: Set A5 to =B3, A6 to =A5+B$2 and copy A6 down column A.

Set the 5th5^{th} row of columns B through H as follows:

In B5, enter =B$2*A5*sin(A5); in C5, =C4+B5; in D5, =-(B5+B$5)/2; in E5, =C5+D5; in F5, =2*B5+F3; in G5, =2*D5+F5; in H5, =(4*E5-G5)/3. Copy all these down their columns.

The entries in H(5+2j) etc give Simpson’s rule, integrating from the value in A5 to that in A(5+2j).

Column A contains the variable, B contains f(x)df(x)d which is the integrand times the width of the interval, C contains its partial sums, D contains the correction to make this the trapezoid rule which is in E, F jumps by 22 in taking sums and doubling the width which corresponds to doubling dd, G corrects the endpoints to create the trapezoid rule for 2d2d for appropriate intermediate endpoints and H creates Simpson’s rule from the dd and 2d2d trapezoid rule answers.

Having done this you can change dd and the starting point by changing the content of A1 and B1. To change the integrand you need only change column B.

You should test your answer with an integral whose value you know to find any bugs in your spreadsheet. You can try doubling dd to see whether that changes your answer much. If not you have computed your integral quite well.

Does this always work?

No. You obviously cannot do this if you want to integrate to infinity. You can also run into trouble if your integrand goes to infinity at some intermediate point. Or if it wobbles insanely.

You may be able to subtract something from it that you know about and has the same singular behavior, and then be able to handle the rest.

Exercise: Try to find a function for which this procedure will fail (one that does not blow up). Things like square roots near 00 might be such.

If you add columns that jump by 44 and do a similar thing with them, you can get the Trapezoid rule for dd replaced by 4d4d. With that you can get two Simpson rule calculations for dd and 2d2d. Taking 1616 times the first less the second, and dividing by 1515 you will get a super Simpson rule, that improves by a factor of 6464 when dd decreases by a factor of 22.

Here are the results for the integral of xsinxx\sin x from x=1x=1 to 22

Number of increments
Number of digits after decimal point

The exact answer, given in column I, is sinxxcosx\sin x - x \cos x (differentiate and see).

To get column I, just enter, in I5, =SIN(A5)-A5*COS(A5)-SIN(A$5)+A$5*COS(A$5) and copy down.

Simpsons rule here is accurate to about 1010 significant figures. The value in red is our Simpson’s Rule answer. The value to its immediate right is the computer evaluation of sinxxcosx\sin x - x \cos x which is the value of this integral.

Notice that you can switch to integrating some other integrand merely by changing column B. This involves a new entry in B5 and copying it down that column.

The starting point (called the lower limit of integration) can be changed by changing B3.

In the computation above, 1010 place accuracy occurs for d=0.001d = 0.001.