Skip to main content

graphics - Mathematica Interpolation or approximation


I have lists of points X and Y.


{1212,1216,1217,1219...}, {1,2,3,4,5,6...}

I combine them to make a one table of points t = Thread[{x, y}]; Using that table I build graphic: enter image description here


So I need to find function/formula out of this table of point. So the best variant as I think is using InterpolatingFunction[range, table] and InterpolatingPolynomial[data, var]. But since we have huge amount of data the polynomial will be really huge (if I'm correct for n points, polynomial will be n-1). So if I understand correct it is not possible to use interpolation for it. Right? Therefore, we need to use regression to find approximated function, using Fit[t, func, v];. To make a better result, as I see, better divide graph in half, where it shifted to the right (near x = 4000). And make two approximation graph instead of one.


Since I'm not using Mathematica in daily use (it is my first time in this program), I need help.



What function is best to use in Fit[t, fun, v]? And how can I build two approximation functions for one graph? Do I need to divide the initial graph by my self?


Are there other methods that I missed?


Here is what I tried to do:


x = Import["C:\...\x.txt","List"];
x = Sort[x];
l = Length[x];
y = Array[0 + # &, l];
t = Thread[{x, y}]
funi = Interpolation[t];
InterpolatingFunction[{1, 200}, 0];

p = InterpolatingPolynomial[t, v] //trying to get a function
{funi[1212]}
ListLinePlot[{t}] //graphic

//----------------------------------- so since I don't see any way that interpolation will work, I decide work more on a way of approximation, but if you have a method that work with interpolation I would glad to see it. So here what I come to:


x = Import["C:\...\x.txt","List"];
x = Sort[x];
l = Length[x];
y = Array[1211 + # &, l];
t = Thread[{x, y}];

l = Fit[t, {1, v, v^2}, v]
l1 = LinearModelFit[t, v1, v1];
Show[ListPlot[t], Plot[l1[v1], {v1, 0, 7000}]]

And I have


enter image description here


it gave me a function that I looking for:


848.726+0.362082*v-0.0000206136*v^2


however i would like to build approximation function, that go through as many points as it can. So how can I do that? As What I ideally trying to make is two approximation functions for each half of the graph that go through as many points as it can.



Answer




While using a computer often means you don't have to worry if there is a large number of polynomials approximating data piecewise, the OP wishes to find a simple polynomial or two that roughly approximates the data. Here is an approach. Please note that data fitting and smoothing is not my forte; but the mathematics used here is fun and too alluring for me not to want to share.


Since interpolation is considered, I'll assume the data represents a function. The goal, then, is to approximate this function. This approach, unlike Anton Antonov's, will rarely interpolate any of the data, but it will approximate it. I'll use Interpolation to represent the function. One can then approximate the function in whatever way, say by a series in orthogonal polynomials such as the Chebyshev polynomials. The advantage to using orthogonal polynomials is that the truncated series solves a certain least-squares approximation problem. The Chebyshev polynomials are convenient because the series is easy to compute.


Some data


img = Import["http://i.stack.imgur.com/AYJja.png"];
rgb = Binarize@*ColorNegate /@ ColorSeparate@img;
blue = Thinning@ImageSubtract[rgb[[1]], rgb[[3]]]

Mathematica graphics


idata = Sort[Mean /@ SplitBy[PixelValuePositions[blue, 1], First]];
ListPlot@idata


Mathematica graphics


Note: We took the mean of the second coordinates for a given first coordinate. Duplication of abscissae occurs in idata because it comes from an image. The actual data used by the OP to produce the image might not have this problem.


Partition the data


It will be hard to approximate data that exhibits broad oscillations (or changes in concavity). The OP seemed delete the horizontal segment in the middle. Let's do that. I suspect that breaking up the data into pieces will take manual intervention.


{{{x1}}, {{x2}}} = Position[idata, #] & /@ 
DeleteCases[SplitBy[idata, Last], x_ /; Length[x] < 8][[1, {1, -1}]]
(* {{{148}}, {{188}}} *)

Construct the series



We'll show the steps in the code below. The first step is to select which data set to process, the whole, the first part, or the last part.


(* Select data set (uncomment): *)
(*data=idata;*) (* whole data set *)
data = idata[[;; x1]]; (* left segment *)
(*data=idata[[x2;;]];*) (* right segment *)

{xdata, ydata} = Transpose@data;
xdata = data[[All, 1]];
xdom = MinMax[xdata];


prec = MachinePrecision; (* precision *)
nn = 256; (* max. degree: 16 should be sufficient; more here to illustrate convergence*)
cnodes = N[Cos[Pi*Range[0, nn]/nn], prec]; (* Chebyshev extreme points *)
yFN = Interpolation@data;
cc = Sqrt[2/nn] FourierDCT[ (* DCT-I solves for the Chebyshev coefficents *)
yFN /@ Rescale[cnodes, {-1, 1}, xdom], 1];
cc[[{1, -1}]] /= 2; (* DCT-I is off by half at the end points *)

Now cc contains the Chebyshev series coefficients. Take any many as you please to form the Chebyshev series of degree n:


c[[ ;; n+1]] . ChebyshevT[Range[0, n], x]


What degree is good? (How many coefficients to use?)


It's a bit of an art to determine how to use the Chebyshev coefficients cc. The lower the degree the smoother the model; the higher the degree, the closer the approximation to the data. You can plot the polynomials for various degrees, calculate the sum of squares of the residuals at the data point and/or the so-called "energy": $$\int_a^b \left[f''(x)\right]^2\;dx$$ It's not real clear what the optimal solution would be. As the degree n increases, the sum of squares should converge to zero and the energy tend to increase. One might seek a balance between closeness of approximation (sum of squares) and smoothing (energy). Just how to weight each component is a matter of judgment (as far as I know). One might also just see how the sum of squares improves with each increase in degree.


Below I show as a measure of quality 1000 times the energy plus the sum of squares of the residuals. (Auxillary functions given below.)


Manipulate[
With[{d2cc = Nest[dCheb[#, xdom] &, cc[[;; n]], 2]}, (* 2nd derivative of Cheb. series *)
With[{cf = chebeval[cc[[;; n]]], dcf = chebeval[d2cc[[;; n - 2]]]},
Plot[cf@Rescale[x, xdom, {-1, 1}], {x, xdom[[1]], xdom[[2]]},
Prolog -> {Red, Point@data},
PlotLabel ->

Row[{"1000 Energy ", #1, " + SS ", #2, " = ", 1000 #1 + #2} &[
i0Cheb[timesCheb[d2cc, d2cc], {xdom[[1]], xdom[[2]]}], (* integrate Cheb. series *)
N@Total[(ydata - cf /@ Rescale[N@xdata, xdom, {-1, 1}])^2] (* sum of squares *)
]],
Frame -> True,
PlotPoints -> Min[Max[30, Ceiling[n/2]], Ceiling[(xdom[[2]] - xdom[[1]])/2]],
PlotRange -> {xdom, MinMax[ydata] + {-2, 2}},
PlotRangePadding -> Scaled[.02]
]
]],

{{n, 10}, 3, nn, 1, Appearance -> "Labeled"}
]

Mathematica graphics


One can get quite a close approximation to the data (SS < 0.17). Note this is different than the interpolating polynomial through these points. Even though the degree is 240, evaluation is numerically stable (chebeval uses Clenshaw's algorithm; see below).


Mathematica graphics


Here is the entire data set with a degree-15 approximation:


Mathematica graphics


Note: It would probably make sense to use the mean sum of squares, SS/Length[data] instead of SS, when comparing data sets. (I didn't bother at first because my initial focus was on an optimal approximant for a single data set.)


Auxiliary functions



The Manipulate code above uses the following to do some algebra, calculus and evaluation with Chebyshev series. These are all based on various recursive properties of the Chebyshev polynomials.


(* Multiplies two Chebyshev series *)
Clear[timesCheb];
timesCheb[aa_?VectorQ, bb_?VectorQ] := Module[{cc},
cc = ConstantArray[0, Length[aa] + Length[bb] - 1];
Do[
cc[[1 + Abs[i + j - 2]]] += aa[[i]] bb[[j]]/2;
cc[[1 + Abs[i - j]]] += aa[[i]] bb[[j]]/2,
{i, Length@aa}, {j, Length@bb}];
cc

];

(* Differentiate Chebyshev series *)
Clear[dCheb];
dCheb::usage =
"dCheb[c, {a,b}] differentiates the Chebyshev series c scaled over the interval {a,b}";
dCheb[c_] := dCheb[c, {-1, 1}];
dCheb[c_, {a_, b_}] := Module[{c1 = 0, c2 = 0, c3},
2/(b - a) MapAt[#/2 &,
Reverse@Table[c3 = c2; c2 = c1; c1 = 2 (n + 1)*c[[n + 2]] + c3,

{n, Length[c] - 2, 0, -1}],
1]
];

(* Complete definite integral of Chebyshev series *)
Clear[i0Cheb];
i0Cheb::usage = "i0Cheb[c, {a,b}] integrates the Chebyshev series over {a,b}";
i0Cheb[c_] := i0Cheb[c, {-1, 1}];
i0Cheb[c_, {a_, b_}] :=
Total[(b - a)/(1 - Range[0, Length@c - 1, 2]^2) c[[1 ;; ;; 2]]];


(* Returns function to evaluate a Chebyshev series (compiled version)
* Clenshaw's algorithm, c = {c0, c1,..., cn} Chebyshev coefficients
* The Dot product given above is just as good (c . ChebyshevT[Range[0,n], x])
* when x is numeric. *)
Clear[chebeval];
chebeval::usage =
"f = chebeval[c], y = f[x], evaluates Chebyshev series, x rescaled to -1 <= x <= 1";
chebeval[c0 : {__Real?MachineNumberQ}] :=
With[{c1 = Reverse[c0], deg = Length@c0 - 1}, Compile[{x},

Block[{c = c1, s = 0., s1 = 0., s2 = 0.},
Do[
s = c[[i]] + 2 x*s1 - s2;
s2 = s1;
s1 = s,
{i, deg}];
Last@c + x*s1 - s2], RuntimeAttributes -> {Listable}]];

Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...