Skip to main content

probability or statistics - Assign Error bars for y-intercept


I have some data (x,y) with error bars in y direction:


  {{{1/10, 4.92997}, ErrorBar[0.00875039]}, {{1/20, 4.90374}, 
ErrorBar[0.00912412]}, {{1/25, 4.89318},

ErrorBar[0.00707122]}, {{1/30, 4.89534},
ErrorBar[0.00870608]}, {{1/40, 4.87807},
ErrorBar[0.00829155]}, {{1/50, 4.84442},
ErrorBar[0.0226886]}, {{1/100, 4.83867}, ErrorBar[0.0973819]}}

Now I am trying to find a linear fit to the data, and I want the y-intercept of this linear fit (when x=0). How do I get the uncertainty (error bar) for the y-intercept due to those error bars in the data?



Answer



Correction: I've corrected the description of the second model to match what Mathematica actually does as opposed to what I wanted to believe it did.


Use the Weights option with the inverse of the square of the errors:


data = {{{1/10, 4.92997}, ErrorBar[0.00875039]}, {{1/20, 4.90374}, ErrorBar[0.00912412]},

{{1/25, 4.89318}, ErrorBar[0.00707122]}, {{1/30, 4.89534}, ErrorBar[0.00870608]},
{{1/40, 4.87807}, ErrorBar[0.00829155]}, {{1/50, 4.84442}, ErrorBar[0.0226886]},
{{1/100, 4.83867}, ErrorBar[0.0973819]}};
error = data[[All, 2]] /. ErrorBar[x_] -> x;
t = Table[{data[[i, 1, 1]], Around[data[[i, 1, 2]], error[[i]]]}, {i, Length[error]}];
lmf = LinearModelFit[data[[All, 1]], x, x, Weights -> 1/error^2];
lmf["ParameterTable"]
Show[ListPlot[t], Plot[{lmf["MeanPredictionBands"], lmf[x]}, {x, 0, 0.1}]]

Parameter estimates



Data with error bars and prediction with mean confidence band


Appendix: Why not use VarianceEstimatorFunction ?


Consider 3 linear models with slightly different error structures:


$$y_i=a+b x_i+σϵ_i$$ $$y_i=a+b x_i+w_i \sigma \epsilon_i$$ $$y_i=a+b x_i+w_i \epsilon_i$$


where $y_1,y_2,\ldots,y_n$ are the observations, $x_1,x_2,\ldots,x_n$ and $w_1,w_2,\ldots w_n$ are known constants, $a$, $b$, and $σ$ are parameters to be estimated, and $ϵ_i \sim N(0,1)$.


The first model has errors ($σϵ_i$) with the same distribution for all observations. The second model has the standard deviation of the random error proportional to the weights. The third model has the random error standard deviation being exactly the associated weight (i.e., the same structure as the second model but with $\sigma=1$).


While I would argue that there are few instances where the third model is appropriate, that model can be appropriate when justified. (Also, weights are most of the time estimated from some previous data collection process rather than really being known but I’ll suspend disbelief about that for this discussion.) It would be desirable for Mathematica to offer the option of two (or more) sources of random error (measurement error and lack-of-fit error) but that is not currently directly available.


To estimate the coefficients in the 3 models, Mathematica would use 3 different formulations of LinearModelFit:


lmf1=LinearModelFit[data,x,x]
lmf2=LinearModelFit[data,x,x,Weights->1/error^2]

lmf3=LinearModelFit[data,x,x,Weights->1/error^2,VarianceEstimatorFunction->(1&)]

Here are the parameter estimates for the 3 models:


First model parameter table


Second model parameter table


Third model parameter table


The moral of the story is that what options to use with LinearModelFit and NonlinearModelFit depends on what error structure is reasonable. So the use of the option VarianceEstimatorFunction implies a specific type of error structure. Does the OP know that there is only measurement error and that the weights are known precisely? I would find that hard to believe so I wouldn’t use VarianceEstimatorFunction -> (1)& in this case.


While knowing what error structure is appropriate prior to collecting the data is preferred, is there a way to use the data to suggest which error structure is better? (Not "best" but “better” in a relative sense). The answer is Yes. The model with the smallest AIC (or AICc) value should usually be chosen (unless maybe the difference in AIC values is less than 1 or 2 and then take the one that is either less complicated or matches the measurement process).


For this data the second model fits best by a small amount:


lmf1["AICc"]

(* -25.423 *)
lmf2["AICc"]
(* -30.1466 *)
lmf3["AICc"]
(* -29.4193 *)

The AICc values are close between the second and third models so it is not impossible that the third model is inappropriate in this case. However, I would still argue that in practice one should always consider the second model.


The estimated variance for the second model is less than 1 which suggests that the estimated weights might be a bit too large (which is counter to what I think usually happens):


lmf2["EstimatedVariance"] (* 0.758505 ) lmf3["EstimatedVariance"] ( 1 *)


In short, fitting a linear model includes both the "fixed" (expected value) portion and the random structure and just because one "knows" the precision of the measurement that doesn't mean that there aren't other sources of error (especially that the weights are known exactly). More flexibility with error structures would be a great addition to Mathematica.



Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...