numerical integration - How to integrate functions of linearly interpolated data?

At first, consider integration of pure InterpolatingFunction.

Importing some data (works in v.9, for earlier versions one can use this link to download zipped M file which can be imported by data=Import["data.zip", "data.m"]):

data = First@
   Import["http://webbook.nist.gov/cgi/cbook.cgi?JCAMP=C67561&Index=17&Type=IR"];

Transforming the data according to an integrand:

integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;

Now interpolation and integration:

In[13]:= int = Interpolation[dataF, InterpolationOrder -> 1];
ni = NIntegrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}]

During evaluation of In[13]:= NIntegrate::ncvb: NIntegrate failed to
converge to prescribed accuracy after 9 recursive bisections in ww
near {ww} = {1053.057940138439}. NIntegrate obtained 0.000035786622823277624
and 1.6106181332828143*^-6 for the integral and error estimates. >>


Out[14]= 0.0000357866

I am confused by unexpectedly high absolute error (see next) and strange error Message generated by NIntegrate when integrating function of linearly interpolated data. Note that NIntegrate fails to converge on linear (!) piece of the polyline:

In[15]:= Nearest[dataF[[All, 1]], 1053.057940138439`, 2]

Out[15]= {1053.0705299372542, 1053.010264968627}

Integrate is able to integrate pure InterpolatingFunctions:

i = Integrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}]

0.00003537589327711996

We can check it by direct integration:

exactSum = 
 MovingAverage[dataF[[All, 2]], 2].Differences[dataF[[All, 1]]]

0.00003537589420603186

The differences:

{ni, i} - exactSum

{4.10729*10^-7, -9.28912*10^-13}

One can see the large error for NIntegate output. Increasing PrecisionGoal and MaxRecursion does not give much:

ni = NIntegrate[int[ww], {ww, dataF[[1, 1]], dataF[[-1, 1]]}, 

   MaxRecursion -> 25, PrecisionGoal -> 16] // Quiet
exactSum - ni

0.000035342690008474926

3.32042*10^-8

At this point my questions are:

Why NIntegrate fails to converge when interpolating the polyline? Why it has biggest trouble not even on a node but on a linear piece of the polyline?

Is it possible to achieve for NIntegrate at least the same precision as it is for Integrate? Or, better, as it is for direct summation?

Now I wish to integrate a function of linearly interpolated data:

int2 = Interpolation[data, InterpolationOrder -> 1];
ni2 = NIntegrate[int2[ww]/ww, {ww, data[[1, 1]], data[[-1, 1]]}, 
   MaxRecursion -> 22, PrecisionGoal -> 19] // Quiet

0.000035342690032342044

Integrate cannot work with functions of interpolated data at all. Direct summation gives:

area[{w1_, k1_}, {w2_, k2_}] = 
  Integrate[(a + b ww)/ww, {ww, w1, w2}, PrincipalValue -> True] /. 
    First@Solve[a + b w1 == k1 && a + b w2 == k2, {a, b}] // Simplify;
exactSum = Total[area @@@ Partition[data, 2, 1]]

exactSum - ni2

0.000035375894186915904

3.32042*10^-8

Again, I get very large error. It is not always possible to get closed-form formula for direct summation. So the final question is: what is the best way to integrate smooth functions of linearly interpolated data with high precision?

Answer

Update:

After identifying and fixing a problem with data, we show that the setting the "MaxSubregions" option of "InterpolationPointsSubdivision" to the number intervals created by the interpolation points produces a highly accurate answer in a reasonably short amount of time.

Verify the data!

Ahhh, this is maddening: there's problem with the data. One of the points (the 7th) is out of order and causes exactSum to be off by -9.28912*10^-13, which coincides the "error" of my first answer. If we fix this by sorting the data and recomputing the "exact" sum, the error is greatly diminished. The first answer (see below) turns out to be highly accurate and fast (for NIntegrate).

Here is the offending datum:

data[[4 ;; 8]]

(*  {{575.171, -2.41302*10^-7}, {575.231, -6.9385*10^-8},
     {575.291, 8.54164*10^-7}, {575.29, 2.06936*10^-6}, (* <-- here *)
     {575.35, 3.80004*10^-6}} *)

Here is the corrected exact sum, exactSumSorted, and the comparison with exactSum:

integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;

dataFSorted = {#1, integrand[#1, #2]} & @@@ Sort@data;
exactSum = MovingAverage[dataF[[All, 2]], 2].Differences[dataF[[All, 1]]]
exactSumSorted = MovingAverage[dataFSorted[[All, 2]], 2] .
                   Differences[dataFSorted[[All, 1]]]
(*              !
   0.00003537589420603182` 
   0.000035375893277120126`
                ^            *)

exactSumSorted - exactSum

(*
   -9.28912*10^-13
*)

Comparison of first answer with the sums:

ni - exactSum
ni - exactSumSorted
(*
   -9.28912*10^-13
   -1.76183*10^-19

*)

Now the relative error,

(ni - exactSumSorted)/exactSumSorted
(* 
   -4.98031*10^-15
*)

is much more in line with what is expected.

The second integral

I omitted to deal with the second integral in my first answer, but it was what got me curious about what was going on. The graph of int2[ww] / ww should be piecewise concave up and the trapezoid rule should overestimate the integral, but the reverse was happening.

We can get the exact sum of the integrals over the subregions of the interpolating function in a way similar to how Alexey Popkov got the exact sum for the integral of the linear interpolation. First observe the following. If value of a function y = f[x] is linearly interpolated between the values y1, y2, as x ranges from x1 to x2, then we can compute the integral of f[x]/x as follows:

Integrate[((1 - t) y1 + t y2)/((1 - t) x1 + t x2) (x2 - x1), {t, 0, 1},
  Assumptions -> {x2 > x1 > 0}]
(*
   -y1 + y2 + ((-x2 y1 + x1 y2) Log[x2/x1])/(x1 - x2)
*)

[Update 2: Further explanation. First, it was easier for me to think abstractly in terms of x and y; the correspondence is x = ww and y = int2[ww]. Switching to TeX, a linear interpolation between any two values $A$, $B$ is given by $$(1-t)\,A + t\, B, \quad 0 \le t \le 1\,.$$ (Clearly it is linear in $t$ and ranges from $A$ to $B$ as $t$ ranges from $0$ to $1$. Applying this to a subregion $x_1 \le x \le x_2$ where $y$ is interpolated linearly between the values $y_1 = {\tt int2}(x_1)$ and $y_2 = {\tt int2}(x_2)$, we can paramterize the interpolation by $$ y = (1-t)\,y_1 + t\,y_2,\quad x = (1-t)\,x_1 + t\,x_2 = x_1 + t\,(x_2 - x_1), \quad 0 \le t \le 1.$$ Substituting into the integral, this becomes $$\int_{ww_1}^{ww_2} {{\tt int2}(ww) \over ww}\;d(ww) = \int_{x_1}^{x_2} {y \over x}\;dx = \int_{0}^{1} {(1-t)\,y_1 + t\,y_2 \over (1-t)\,x_1 + t\,x_2}\;(x_2-x_1)\,dt\,,$$ where the factor $x_2-x_1 = dx/dt$. This corresponds to the Mathematica integral above. To get the integral over the whole data set, we just total up the values of this integral over all of the subregions, which we do below to get exactSum2.]

Thus the exact value (ignoring machine precision error) can be calculated with

With[{data = Sort@data},
 exactSum2 = Total[
    -First @ Differences[               (* -x2 y1 + x1 y2 *)
        Reverse @ Transpose @ Most[data] Transpose @ Rest[data]] *
      Log @ Ratios[data[[All, 1]]] /    (* Log[x2/x1] *)
      Differences[data[[All, 1]]]       (* x1 - x2 -- note minus in front *)
    ] + data[[-1, 2]] - data[[1, 2]]    (* telescoping y2 - y1 *)
 ]
(*
   0.00003537589326824584`

*)

The integral may be computed, using the same "InterpolationPointsSubdivision" options as in the first answer, with

int2 = Interpolation[Sort@data, InterpolationOrder -> 1];

(ni2 = NIntegrate[int2[ww]/ww, 
   Evaluate[{ww}~Join~First@int2["Domain"]], 
   Method -> {"InterpolationPointsSubdivision", 
     "MaxSubregions" -> 1 + Length[First@int2["Coordinates"]]}
   ]) // AbsoluteTiming

(*
   {29.845430, 0.00003537589326907005`}
                               ^         *)

Compared to Alexey Popkov's similar version, we see the above is a little faster and agrees with Alexey's up to the last digit.

(ni2AP = NIntegrate[int2[ww]/ww, 
   Evaluate[{ww}~Join~First@int2["Domain"]], 
   Method -> {"InterpolationPointsSubdivision", 
     "MaxSubregions" -> 10^9, 
     Method -> {"LocalAdaptive", MaxPoints -> Infinity, Method -> {"TrapezoidalRule"}}}

   ]) // AbsoluteTiming
(*
   {31.126306, 0.000035375893269070053`}
                               ^     ^    *)

The error is about the best that could be expected:

{ni2, ni2AP} - exactSum2
(*
   {8.24204*10^-16, 8.2421*10^-16}
*)

For one thing, as a very rough estimate, we can estimate that about four digits or more of precision might be lost just from differences of the first coordinate:

Max @ Abs[(Rest[#] - Most[#])/Most[#]] &@ Sort@data[[All, 1]]
(*
   0.000119089
*)

[Update 2: I should have stressed that this is a crude estimate I used to give me an idea of what a reasonable error between NIntegrate and exactSum might be. The errors are probably randomly distributed and some probably cancel others. Also the greatest loss of precision in a single difference would be indicated by Min instead of Max (about 6 digits). So the actual error might vary widely from this estimate. The calculation above is based on this sort of example: Consider the two numbers below and their absolute difference:

0.00001234123
0.00001234567

-------------
0.00000000444 (abs. val. of difference)

We went from 7 digits to 3 digits of precision. The relative difference is

(0.00001234567 - 0.00001234123)/0.00001234567
(*
   0.00035964
*)

The number of digits lost is indicated by the logarithm:

Log10[0.00035964]
(*
   -3.44413
*)

End of update 2.]

First answer

To get a complete "InterpolationPointsSubdivision", a sufficient number of subregions needs to be allowed with the "MaxSubregions" option. This allows NIntegrate to handle the 50K+ subintervals in the interpolating function int.

data = First @
  Import["http://webbook.nist.gov/cgi/cbook.cgi?JCAMP=C67561&Index=17&Type=IR"];

integrand[ww_, k_] := k/ww
dataF = {#1, integrand[#1, #2]} & @@@ data;
int = Interpolation[dataF, InterpolationOrder -> 1];

(ni =
   NIntegrate[
     int[ww],
     Evaluate[{ww} ~Join~ First@int["Domain"]], 
     Method -> {"InterpolationPointsSubdivision", 
                "MaxSubregions" -> 1 + Length[First@int["Coordinates"]]}]

 ) // AbsoluteTiming

(* {0.129937, 0.0000353759} *)

I'm not sure why one more subregion is needed than coordinates/data points. I suppose it is because N coordinates divide the real numbers into N+1 regions, ignoring the interval of integration.

Postscript: More of the story

Every now and then I tried NIntegrate with some other option, even increasing WorkingPrecision, to see if the integral would get closer to the exact value. It was the stubborn resistance of NIntegrate to get the relative error down to close to 10^-15, or even 10^-12, that made me suspect that NIntegrate was right and exactSum was wrong.

Here is something I found along the way that pleased me. Nothing earth shattering, but when a numerical method returns an error that is essentially zero, it's fun.

I used arbitrary precision numbers with 30 digits of precision. They can be quite a bit slower. Here is the exact value.

(With[{data = SetPrecision[Sort@data, 30]},

  exactSum2p = Total[
     -First @ Differences[               (* -x2 y1 + x1 y2 *)
         Reverse @ Transpose @ Most[data] Transpose @ Rest[data]] *
       Log @ Ratios[data[[All, 1]]] /    (* Log[x2/x1] *)
       Differences[data[[All, 1]]]       (* x1 - x2 -- note minus in front *)
     ] + data[[-1, 2]] - data[[1, 2]]    (* telescoping y2 - y1 *)
   ]) // AbsoluteTiming
(*
   {0.722450, 0.00003537589326907000957268}
*)

The machine precision calculation took 0.008799 sec. NIntegrate will be much slower, too.

int2p = Interpolation[SetPrecision[Sort@data, 30], InterpolationOrder -> 1];
(ni2p = NIntegrate[int2p[ww]/ww, 
    Evaluate[{ww} ~Join~ First@int2p["Domain"]],
    Method -> {"InterpolationPointsSubdivision", 
      "MaxSubregions" -> 1 + Length[First@int2p["Coordinates"]]},
    WorkingPrecision -> 30
    ]) // AbsoluteTiming
(*

   {160.618752, 0.0000353758932690700095726756654469}
*)

Here is a comparison of the results:

Print[ni2p // InputForm]
Print[exactSum2p // InputForm]
(*
   0.000035375893269070009572675665446947030350735701361289965`21.891735884879235
   0.0000353758932690700095726756654469470303507494495353721558`30.
                              ^ 22nd            ^ 39th (1st difference)  *)


ni2p - exactSum2p // InputForm
(*
  0``26.343028466547207  (* StandardForm: 0.*10^-27 *)
*)

The last output means the first 26+ digits after the decimal place are zero. This corresponds to exactSum2p having almost 22 digits of precision and its first nonzero digit being the 5th after the decimal point.

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog