14.3 Why Do these Rules Work?

Suppose your integrand is \(f(x)\) and \(f\) has a power series expansion about the point \(q\) whose coefficients do not go wild. We can then write, for \(x\) near \(q\)

\[f(x) = f(q) + (x-q)f'(q) + (x-q)^2\frac{f^{(2)}(q)}{2} + (x-q)^3\frac{f^{(3)}(q)}{3!} + (x-q)^4\frac{f^{(4)}(q)}{4!} + \ldots\]

where \(f^{(k)}(q)\) is the \(k^{th}\) derivative of \(f(x)\) evaluated at \(x = q\).

We want to find the area under \(f(x)\) when \(x\) ranges from \(q-d\) to \(q+d\), an interval of length \(2d\).

Notice that if we form \(f(q+d) + f(q-d)\) the contributions of \(f'(q)\) which is the second term on the right here, will cancel out. In fact all the terms involving odd derivatives will cancel out:

\[f(q+d) + f(q-d) = 2f(q) + 2d^2\frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of } f\]

We can conclude, by integrating both sides of this equation:

\[\int_{q-d}^{q+d} f(x)dx = \int_{-d}^{d} dxf(q) + \int_{-d}^d y^2dy \frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of }f\]

The first term on the right is \(2df(q)\) and the next is \(2d^3\frac{f^{(2)}(q)}{3!}\), and the remaining terms are proportional to higher powers of \(d\).

The first term of \(2df(q)\), by the previous equation differs from \(d(f(q+d) + f(q-d))\) by \(-d^3f^{(2)}(q)\) and terms of higher power in \(d\).

The upshot of this argument is that approximating the integral of \(f\) in the interval within \(d\) of \(q\) by \(d(f(q+d)+f(q-d))\) produces an error proportional to \(d^3\) as well as terms of higher odd powers of \(d\) and no error linear in \(d\) or quadratic in \(d\). In fact, the error term cubic in \(d\) is \(\frac{2}{3}d^3f^{(2)}(q)\).

If we decrease \(d\) by a factor of \(2\), \(f(q)d\) decreases by a factor of \(2\), but this is compensated for by the fact that there are now twice as many intervals, each of half the size of those present before the decrease. Thus, the contributions of the first terms to the integral as a whole will usually not change much. (The only changes will come from changes in the evaluations of terms like \(f(d+q)\) when \(d\) changes.) On the other hand, the contributions of the cubic error terms will decrease by roughly a factor of \(8\) and, since there will be twice as many intervals, their contribution to the overall error will decrease by roughly a factor of \(4\).

So?

This means that if we take four times the estimate after the split and subtract the estimate before the split we will (almost) eliminate the first error term, and the resulting next error term will decrease on splitting by a factor of \(16\). We get estimates \(4-1\) or \(3\) times so we divide by \(3\) to get back to an estimate of the integral.

So here is our plan:

First we calculate the integral from the starting point to any later point using the left hand rule. This is very easy to do, as we shall see.

How?

We can do this by creating two columns: an \(x\) column and an integral column.

The \(x\) column (say column A) starts, (say in A5) with the lower end of the integral. Then for \(k\) larger than \(5\) we set Ak to A(k-1) + d.

In the left rule integral column we set Ck to C(k-1) + d*f(Bk).

The left hand rule integral from the content of A5 to that of Ak will be C(k-1).

Actually, when you want to change the function you are integrating, you are better off if all evaluations of the function are in one column, Thus you might want to set Bk to d*f(Ak), and Ck to C(k-1) + Bk.

Next we convert the left hand rule integral in column C to the trapezoid rule integral in column D.

How?

In D we add to Ck a term which gets rid of half of A5*d and half of d*Ak. Explicitly, we set Dk to Ck - (A$5+Ak)*d/2 .

The Trapezoid rule integral from B5 to Bk now appears in Dk.

Next we convert this to Simpson’s Rule in column E.

How?

Lets call the Trapezoid rule result with interval size \(d\) with endpoint \(\text{start} + kd,\) \(T(d,k).\)

What we want to do is to repeat the Trapezoid calculation for interval size \(2d\) and then form \(\frac{4T(d,k)-T(2d,k)}{3}.\) The result will have error that behaves as \(d^4\). This estimate is called Simpson's Rule.

Why do we compute Simpson's rule in this way? Because it is easy to apply the left hand rule, easy to get the Trapezoid rule results in a column from it, and easy to double the interval size in another column. Having done so, its easy to form the new rule from the old ones on a spreadsheet according to the expression in the last paragraph above, in a third column.

It is easiest to accomplish this when \(T(2d,k)\) is in the same row as \(T(d,k)\). Recall that the contribution to the left hand rule at row \(k\) is \(df(k)\) which is added to the result in the row corresponding to \(k-1\). For \(2d\), this contribution is doubled and added to the result for the previous doubled \(d\) row which is \(2\) above it.

The correction to make the left hand rule into the trapezoid rule for the integral from start \(s\) to \(s+kd\) is to subtract half of the first and last contribution from the partial sum that includes the initial value and the sum up \(s+kd\).

It turns out that there is not much difficulty in putting \(T(2d,2k)\) in the same line as \(T(d,2k)\) which makes it easy to form \(\frac{4T(d,k)-T(d,2k)}{3}\).

Well, what do you actually do?

Put your interval size \(d\) in some box, say A1.

In column A, put your start point in say A5, and from A6 at each step increase the value by \(d\). Thus in A6, you can put =A5+A$1, and A6 can be copied down column A as far as you want.

In column B, put the value of the integrand times \(d\) at the corresponding argument: in B5 put =A$1*f(A5), and copy this down column B.

In column C, put the partial sums of column B: which means, in C5 put =C4+B5, and copy down column C.

In column D, put in D5: =-(B5+B$5)/2 and copy this down column D.

The Trapezoid answer from your start (which is in B5) to the value in Bk will be Ck+Dk which you can put in box Ek, by setting E5 to =C5+D5 and copying this down column E.

In F, set F5 =2*B5+F3 and copy down F (This will put the left hand \(2d\) rule results in the odd numbered rows of F beyond row 5. The even numbered rows contain useless junk.)

In G5, set =2*D5 +F5 and copy down G, which will give the integral for interval \(2d\) trapezoid result in the odd numbered entries of column G.

In H5, set =(4*E5-G5)/3 and copy down. Simpson's rule for the integral from the content of B5 to that of B(2k+1) will then appear in H(2k+1), for \(k\) least \(3\).

Next we do an explicit example for the function \(x\sin(x)\).

Preliminaries:
Set A1 to Integrate f(x), B1 to f(x) = xsin(x)
Set A2 to d, B2 to 0.01
Set A3 to startpoint, B3 to 1

Create Columns: Set A5 to =B3, A6 to =A5+B$2 and copy A6 down column A.

Set the \(5^{th}\) row of columns B through H as follows:

In B5, enter =B$2*A5*sin(A5); in C5, =C4+B5; in D5, =-(B5+B$5)/2; in E5, =C5+D5; in F5, =2*B5+F3; in G5, =2*D5+F5; in H5, =(4*E5-G5)/3. Copy all these down their columns.

The entries in H(5+2j) etc give Simpson’s rule, integrating from the value in A5 to that in A(5+2j).

Column A contains the variable, B contains \(f(x)d\) which is the integrand times the width of the interval, C contains its partial sums, D contains the correction to make this the trapezoid rule which is in E, F jumps by \(2\) in taking sums and doubling the width which corresponds to doubling \(d\), G corrects the endpoints to create the trapezoid rule for \(2d\) for appropriate intermediate endpoints and H creates Simpson’s rule from the \(d\) and \(2d\) trapezoid rule answers.

Having done this you can change \(d\) and the starting point by changing the content of A1 and B1. To change the integrand you need only change column B.

You should test your answer with an integral whose value you know to find any bugs in your spreadsheet. You can try doubling \(d\) to see whether that changes your answer much. If not you have computed your integral quite well.

Does this always work?

No. You obviously cannot do this if you want to integrate to infinity. You can also run into trouble if your integrand goes to infinity at some intermediate point. Or if it wobbles insanely.

You may be able to subtract something from it that you know about and has the same singular behavior, and then be able to handle the rest.

Exercise: Try to find a function for which this procedure will fail (one that does not blow up). Things like square roots near \(0\) might be such.

If you add columns that jump by \(4\) and do a similar thing with them, you can get the Trapezoid rule for \(d\) replaced by \(4d\). With that you can get two Simpson rule calculations for \(d\) and \(2d\). Taking \(16\) times the first less the second, and dividing by \(15\) you will get a super Simpson rule, that improves by a factor of \(64\) when \(d\) decreases by a factor of \(2\).

Here are the results for the integral of \(x\sin x\) from \(x=1\) to \(2\)

Number of increments
Number of digits after decimal point

The exact answer, given in column I, is \(\sin x - x \cos x\) (differentiate and see).

To get column I, just enter, in I5, =SIN(A5)-A5*COS(A5)-SIN(A$5)+A$5*COS(A$5) and copy down.

Simpsons rule here is accurate to about \(10\) significant figures. The value in red is our Simpson’s Rule answer. The value to its immediate right is the computer evaluation of \(\sin x - x \cos x\) which is the value of this integral.

Notice that you can switch to integrating some other integrand merely by changing column B. This involves a new entry in B5 and copying it down that column.

The starting point (called the lower limit of integration) can be changed by changing B3.

In the computation above, \(10\) place accuracy occurs for \(d = 0.001\).