Skip to main content

numerical integration - How to integrate functions of linearly interpolated data?


At first, consider integration of pure InterpolatingFunction.


Importing some data (works in v.9, for earlier versions one can use this link to download zipped M file which can be imported by data=Import["data.zip", "data.m"]):


data = First@
Import["http://webbook.nist.gov/cgi/cbook.cgi?JCAMP=C67561&Index=17&Type=IR"];

Transforming the data according to an integrand:


integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;


Now interpolation and integration:


In[13]:= int = Interpolation[dataF, InterpolationOrder -> 1];
ni = NIntegrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}]


During evaluation of In[13]:= NIntegrate::ncvb: NIntegrate failed to
converge to prescribed accuracy after 9 recursive bisections in ww
near {ww} = {1053.057940138439}. NIntegrate obtained 0.000035786622823277624
and 1.6106181332828143*^-6 for the integral and error estimates. >>


Out[14]= 0.0000357866

I am confused by unexpectedly high absolute error (see next) and strange error Message generated by NIntegrate when integrating function of linearly interpolated data. Note that NIntegrate fails to converge on linear (!) piece of the polyline:


In[15]:= Nearest[dataF[[All, 1]], 1053.057940138439`, 2]


Out[15]= {1053.0705299372542, 1053.010264968627}

Integrate is able to integrate pure InterpolatingFunctions:



i = Integrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}]


0.00003537589327711996

We can check it by direct integration:


exactSum = 
MovingAverage[dataF[[All, 2]], 2].Differences[dataF[[All, 1]]]



0.00003537589420603186

The differences:


{ni, i} - exactSum


{4.10729*10^-7, -9.28912*10^-13}

One can see the large error for NIntegate output. Increasing PrecisionGoal and MaxRecursion does not give much:


ni = NIntegrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}, 

MaxRecursion -> 25, PrecisionGoal -> 16] // Quiet
exactSum - ni


0.000035342690008474926

3.32042*10^-8

At this point my questions are:




  1. Why NIntegrate fails to converge when interpolating the polyline? Why it has biggest trouble not even on a node but on a linear piece of the polyline?

  2. Is it possible to achieve for NIntegrate at least the same precision as it is for Integrate? Or, better, as it is for direct summation?


Now I wish to integrate a function of linearly interpolated data:


int2 = Interpolation[data, InterpolationOrder -> 1];
ni2 = NIntegrate[int2[ww]/ww, {ww, data[[1, 1]], data[[-1, 1]]},
MaxRecursion -> 22, PrecisionGoal -> 19] // Quiet


0.000035342690032342044


Integrate cannot work with functions of interpolated data at all. Direct summation gives:


area[{w1_, k1_}, {w2_, k2_}] = 
Integrate[(a + b ww)/ww, {ww, w1, w2}, PrincipalValue -> True] /.
First@Solve[a + b w1 == k1 && a + b w2 == k2, {a, b}] // Simplify;
exactSum = Total[area @@@ Partition[data, 2, 1]]

exactSum - ni2



0.000035375894186915904

3.32042*10^-8

Again, I get very large error. It is not always possible to get closed-form formula for direct summation. So the final question is: what is the best way to integrate smooth functions of linearly interpolated data with high precision?



Answer



Update:


After identifying and fixing a problem with data, we show that the setting the "MaxSubregions" option of "InterpolationPointsSubdivision" to the number intervals created by the interpolation points produces a highly accurate answer in a reasonably short amount of time.


Verify the data!


Ahhh, this is maddening: there's problem with the data. One of the points (the 7th) is out of order and causes exactSum to be off by -9.28912*10^-13, which coincides the "error" of my first answer. If we fix this by sorting the data and recomputing the "exact" sum, the error is greatly diminished. The first answer (see below) turns out to be highly accurate and fast (for NIntegrate).



Here is the offending datum:


data[[4 ;; 8]]

(* {{575.171, -2.41302*10^-7}, {575.231, -6.9385*10^-8},
{575.291, 8.54164*10^-7}, {575.29, 2.06936*10^-6}, (* <-- here *)
{575.35, 3.80004*10^-6}} *)

Here is the corrected exact sum, exactSumSorted, and the comparison with exactSum:


integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;

dataFSorted = {#1, integrand[#1, #2]} & @@@ Sort@data;
exactSum = MovingAverage[dataF[[All, 2]], 2].Differences[dataF[[All, 1]]]
exactSumSorted = MovingAverage[dataFSorted[[All, 2]], 2] .
Differences[dataFSorted[[All, 1]]]
(* !
0.00003537589420603182`
0.000035375893277120126`
^ *)

exactSumSorted - exactSum

(*
-9.28912*10^-13
*)

Comparison of first answer with the sums:


ni - exactSum
ni - exactSumSorted
(*
-9.28912*10^-13
-1.76183*10^-19

*)

Now the relative error,


(ni - exactSumSorted)/exactSumSorted
(*
-4.98031*10^-15
*)

is much more in line with what is expected.


The second integral



I omitted to deal with the second integral in my first answer, but it was what got me curious about what was going on. The graph of int2[ww] / ww should be piecewise concave up and the trapezoid rule should overestimate the integral, but the reverse was happening.


We can get the exact sum of the integrals over the subregions of the interpolating function in a way similar to how Alexey Popkov got the exact sum for the integral of the linear interpolation. First observe the following. If value of a function y = f[x] is linearly interpolated between the values y1, y2, as x ranges from x1 to x2, then we can compute the integral of f[x]/x as follows:


Integrate[((1 - t) y1 + t y2)/((1 - t) x1 + t x2) (x2 - x1), {t, 0, 1},
Assumptions -> {x2 > x1 > 0}]
(*
-y1 + y2 + ((-x2 y1 + x1 y2) Log[x2/x1])/(x1 - x2)
*)

[Update 2: Further explanation. First, it was easier for me to think abstractly in terms of x and y; the correspondence is x = ww and y = int2[ww]. Switching to TeX, a linear interpolation between any two values $A$, $B$ is given by $$(1-t)\,A + t\, B, \quad 0 \le t \le 1\,.$$ (Clearly it is linear in $t$ and ranges from $A$ to $B$ as $t$ ranges from $0$ to $1$. Applying this to a subregion $x_1 \le x \le x_2$ where $y$ is interpolated linearly between the values $y_1 = {\tt int2}(x_1)$ and $y_2 = {\tt int2}(x_2)$, we can paramterize the interpolation by $$ y = (1-t)\,y_1 + t\,y_2,\quad x = (1-t)\,x_1 + t\,x_2 = x_1 + t\,(x_2 - x_1), \quad 0 \le t \le 1.$$ Substituting into the integral, this becomes $$\int_{ww_1}^{ww_2} {{\tt int2}(ww) \over ww}\;d(ww) = \int_{x_1}^{x_2} {y \over x}\;dx = \int_{0}^{1} {(1-t)\,y_1 + t\,y_2 \over (1-t)\,x_1 + t\,x_2}\;(x_2-x_1)\,dt\,,$$ where the factor $x_2-x_1 = dx/dt$. This corresponds to the Mathematica integral above. To get the integral over the whole data set, we just total up the values of this integral over all of the subregions, which we do below to get exactSum2.]


Thus the exact value (ignoring machine precision error) can be calculated with



With[{data = Sort@data},
exactSum2 = Total[
-First @ Differences[ (* -x2 y1 + x1 y2 *)
Reverse @ Transpose @ Most[data] Transpose @ Rest[data]] *
Log @ Ratios[data[[All, 1]]] / (* Log[x2/x1] *)
Differences[data[[All, 1]]] (* x1 - x2 -- note minus in front *)
] + data[[-1, 2]] - data[[1, 2]] (* telescoping y2 - y1 *)
]
(*
0.00003537589326824584`

*)

The integral may be computed, using the same "InterpolationPointsSubdivision" options as in the first answer, with


int2 = Interpolation[Sort@data, InterpolationOrder -> 1];

(ni2 = NIntegrate[int2[ww]/ww,
Evaluate[{ww}~Join~First@int2["Domain"]],
Method -> {"InterpolationPointsSubdivision",
"MaxSubregions" -> 1 + Length[First@int2["Coordinates"]]}
]) // AbsoluteTiming

(*
{29.845430, 0.00003537589326907005`}
^ *)

Compared to Alexey Popkov's similar version, we see the above is a little faster and agrees with Alexey's up to the last digit.


(ni2AP = NIntegrate[int2[ww]/ww, 
Evaluate[{ww}~Join~First@int2["Domain"]],
Method -> {"InterpolationPointsSubdivision",
"MaxSubregions" -> 10^9,
Method -> {"LocalAdaptive", MaxPoints -> Infinity, Method -> {"TrapezoidalRule"}}}

]) // AbsoluteTiming
(*
{31.126306, 0.000035375893269070053`}
^ ^ *)

The error is about the best that could be expected:


{ni2, ni2AP} - exactSum2
(*
{8.24204*10^-16, 8.2421*10^-16}
*)


For one thing, as a very rough estimate, we can estimate that about four digits or more of precision might be lost just from differences of the first coordinate:


Max @ Abs[(Rest[#] - Most[#])/Most[#]] &@ Sort@data[[All, 1]]
(*
0.000119089
*)

[Update 2: I should have stressed that this is a crude estimate I used to give me an idea of what a reasonable error between NIntegrate and exactSum might be. The errors are probably randomly distributed and some probably cancel others. Also the greatest loss of precision in a single difference would be indicated by Min instead of Max (about 6 digits). So the actual error might vary widely from this estimate. The calculation above is based on this sort of example: Consider the two numbers below and their absolute difference:


0.00001234123
0.00001234567

-------------
0.00000000444 (abs. val. of difference)

We went from 7 digits to 3 digits of precision. The relative difference is


(0.00001234567 - 0.00001234123)/0.00001234567
(*
0.00035964
*)

The number of digits lost is indicated by the logarithm:



Log10[0.00035964]
(*
-3.44413
*)

End of update 2.]


First answer


To get a complete "InterpolationPointsSubdivision", a sufficient number of subregions needs to be allowed with the "MaxSubregions" option. This allows NIntegrate to handle the 50K+ subintervals in the interpolating function int.


data = First @
Import["http://webbook.nist.gov/cgi/cbook.cgi?JCAMP=C67561&Index=17&Type=IR"];

integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;
int = Interpolation[dataF, InterpolationOrder -> 1];

(ni =
NIntegrate[
int[ww],
Evaluate[{ww} ~Join~ First@int["Domain"]],
Method -> {"InterpolationPointsSubdivision",
"MaxSubregions" -> 1 + Length[First@int["Coordinates"]]}]

) // AbsoluteTiming

(* {0.129937, 0.0000353759} *)

I'm not sure why one more subregion is needed than coordinates/data points. I suppose it is because N coordinates divide the real numbers into N+1 regions, ignoring the interval of integration.


Postscript: More of the story


Every now and then I tried NIntegrate with some other option, even increasing WorkingPrecision, to see if the integral would get closer to the exact value. It was the stubborn resistance of NIntegrate to get the relative error down to close to 10^-15, or even 10^-12, that made me suspect that NIntegrate was right and exactSum was wrong.


Here is something I found along the way that pleased me. Nothing earth shattering, but when a numerical method returns an error that is essentially zero, it's fun.


I used arbitrary precision numbers with 30 digits of precision. They can be quite a bit slower. Here is the exact value.


(With[{data = SetPrecision[Sort@data, 30]},

exactSum2p = Total[
-First @ Differences[ (* -x2 y1 + x1 y2 *)
Reverse @ Transpose @ Most[data] Transpose @ Rest[data]] *
Log @ Ratios[data[[All, 1]]] / (* Log[x2/x1] *)
Differences[data[[All, 1]]] (* x1 - x2 -- note minus in front *)
] + data[[-1, 2]] - data[[1, 2]] (* telescoping y2 - y1 *)
]) // AbsoluteTiming
(*
{0.722450, 0.00003537589326907000957268}
*)


The machine precision calculation took 0.008799 sec. NIntegrate will be much slower, too.


int2p = Interpolation[SetPrecision[Sort@data, 30], InterpolationOrder -> 1];
(ni2p = NIntegrate[int2p[ww]/ww,
Evaluate[{ww} ~Join~ First@int2p["Domain"]],
Method -> {"InterpolationPointsSubdivision",
"MaxSubregions" -> 1 + Length[First@int2p["Coordinates"]]},
WorkingPrecision -> 30
]) // AbsoluteTiming
(*

{160.618752, 0.0000353758932690700095726756654469}
*)

Here is a comparison of the results:


Print[ni2p // InputForm]
Print[exactSum2p // InputForm]
(*
0.000035375893269070009572675665446947030350735701361289965`21.891735884879235
0.0000353758932690700095726756654469470303507494495353721558`30.
^ 22nd ^ 39th (1st difference) *)


ni2p - exactSum2p // InputForm
(*
0``26.343028466547207 (* StandardForm: 0.*10^-27 *)
*)

The last output means the first 26+ digits after the decimal place are zero. This corresponds to exactSum2p having almost 22 digits of precision and its first nonzero digit being the 5th after the decimal point.


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....