Skip to main content

bugs - How does LocationTest choose its "AutomaticTest"?


Is there a way of determining how LocationTest chooses its "AutomaticTest"?


Sometimes it's clear from the VerifyTestAssumptions option


sample = BlockRandom[SeedRandom[7];RandomVariate[SkewNormalDistribution@2, 10]];

{LocationTest[sample, Automatic, "AutomaticTest", VerifyTestAssumptions ->{"Normality"}],
LocationTest[sample, Automatic, "AutomaticTest", VerifyTestAssumptions -> "Normality" -> True]}



{SignedRank, T}



other times less so


{LocationTest[sample, Automatic, "AutomaticTest", SignificanceLevel -> 0.05],
LocationTest[sample, Automatic, "AutomaticTest", SignificanceLevel -> 0.0005]}


{SignedRank, T}




( i.e. how can the level of the apriori-decided "burden of proof" affect the resulting sampling distribution? )


Update 1:


Following on from the comments - Perhaps the setting for the significance level is somehow being inherited in checks for normality i.e.


{DistributionFitTest[sample, NormalDistribution[],"ShortTestConclusion", SignificanceLevel -> 0.05], 
DistributionFitTest[sample, NormalDistribution[],"ShortTestConclusion", SignificanceLevel -> 0.0005],
DistributionFitTest[sample, NormalDistribution[], "PValue"]}


{Reject,Do not reject,0.00836514}




The critical point is not around the previous p-value instead appearing just after 0.03


Plot[If[LocationTest[sample, Automatic, "AutomaticTest", SignificanceLevel -> x] === "T", 1, 0], {x, 0.01, 0.05}]

enter image description here


If so, perhaps some sort of Bonferroni correction is taking place?


Update 2:


From the answer from Andy Ross, the relevant test is not


 DistributionFitTest[sample, NormalDistribution[]]


(*0.00836514*)

but rather


 DistributionFitTest[sample]

(*0.0307822*)

indicating that indeed this significance level is being filtered down. This doesn't however, provide a systematic answer to determining how this choice is made in general. In this case, the logic used by LocationTest can be deduced because it is a standard example but applying NHST can be a bit of an art that depends heavily on circumstance for its ultimate interpretation. Hence, having a black box for this logic seems limiting in perhaps a far more consequential way than say for other "more deterministic" and clear-cut algorithms.


Also, what is the rationale behind this inheritance? - in NHST machinery, the significance level has a specific meaning in relation to Type 1/Type 2 errors, power, experimental context etc with the theory being predicated on a test's conditions being apriori satisfied without concession to their own uncertainty (e.g. despite a p-value being returned, is an error message for a failed test of normality in say TTest invocations counted as a false negative/positive in defining a Type 1/Type2 error?).


sample2 = {6.1, -8.4, 1, 2, 0.75, 2.9, 3.5, 5.1, 1.8, 3.6, 7., 3, 9.4,

7.5, -6};
TTest@sample2

(* TTest::nortst: At least one of the p-values in {0.0903246}, resulting from a test for
normality, is below 0.05`. The tests in {T} require that the data is normally distributed. >> *)

(* 0.0498525 *)

The warning above doesn't seem to fit since 0.0903246 is not below 0.05 but explicitly set the significance level to the default 5% however, and the warning message disappears?


TTest[sample2, SignificanceLevel -> 0.05]


(* 0.0498525 *)

At any rate, in general it is quite conceivable that a different significance level for the overarching test might be needed in comparison with the significant level required for an apriori test checking the data's normality, symmetry, heterogeneity etc (n.b. also how are all these combined from the initial setting?)


Update 3: To be a little more systematic: Define a function ShowSignificanceLevelThresholds to show those significant levels at which LocationTest changes its choice of "AutomaticTest" (the "HighlightAutomaticTest" option highlights this choice - both are defined at the post's end).


LocationTest[sample2, Automatic, {"TestDataTable", All},"HighlightAutomaticTest" -> True]
// ShowSignificanceLevelThresholds

enter image description here


As deduced, the first transition appears due to the Koglomorov-Smirnof test of normality and its p-value of 0.09.



DistributionFitTest[sample2, Automatic, {"TestDataTable", All}, 
"HighlightAutomaticTest" -> True]

enter image description here


The second transition at a significance level 0.604 would appear to involve the violation of a symmetry assumption (observable by manual setting the "Symmetry" and "Normality" options to in the "VerifyTestAssumption" option) so that a related question becomes what Test is used by default to test for symmetry?


While a high significance level of 0.604 is unlikely to have much practical significance in this particular case, this may not apply more generally. Again, the unknown effects of passing a significance level down to diagnostic tests is not straightforward and suggests caution in applying LocationTest and its ilk together with scope for further improvements (discussed in more detail here).


(*Needs to be evaluated twice*)
ALocationTest[sample_, x_, {tableType_, All}, y___ : OptionsPattern] /;
TrueQ@("HighlightAutomaticTest" /. {y}) :=
With[{\[ScriptCapitalH]2 =

LocationTest[sample, x, "HypothesisTestData",
FilterRules[{y}, Except["HighlightAutomaticTest"]]]},
Module[{AT = \[ScriptCapitalH]2["AutomaticTest"],
AllTests = \[ScriptCapitalH]2["AllTests"],
TableDataType = StringDrop[tableType, -5],(*Dropping "Table"*)
pos, Headers, grid, TableData},
TableData = \[ScriptCapitalH]2[{TableDataType, All}];
pos = 1 + Position[AllTests, AT][[1, 1]];
Headers =
Switch[TableDataType, "TestData", {" ", "Statistic", "P-Value"},

"TestStatistic", {" ", "Statistic"},
"PValue", {" ", "P-Value"}];
Grid[{Headers,
Sequence @@
MapThread[
Flatten[Join[{#1}, {#2}]] &, {AllTests, TableData}]},
Alignment -> Left,
Background -> {Automatic, pos -> GrayLevel@0.9},
Dividers -> {2 -> True, 2 -> True},
FrameStyle -> Directive[GrayLevel@.7]] // Text]];


Unprotect@LocationTest;
PrependTo[DownValues@LocationTest,
DownValues[ALocationTest] /. ALocationTest -> LocationTest];
Protect@LocationTest;

AVarianceTest[sample_, x_, {tableType_, All}, y___ : OptionsPattern] /;
TrueQ@("HighlightAutomaticTest" /. {y}) :=
With[{\[ScriptCapitalH]2 =
VarianceTest[sample, x, "HypothesisTestData",

FilterRules[{y}, Except["HighlightAutomaticTest"]]]},
Module[{AT = \[ScriptCapitalH]2["AutomaticTest"],
AllTests = \[ScriptCapitalH]2["AllTests"],
TableDataType = StringDrop[tableType, -5],(*Dropping "Table"*)
pos, Headers, grid, TableData},
TableData = \[ScriptCapitalH]2[{TableDataType, All}];
pos = 1 + Position[AllTests, AT][[1, 1]];
Headers =
Switch[TableDataType, "TestData", {" ", "Statistic", "P-Value"},
"TestStatistic", {" ", "Statistic"},

"PValue", {" ", "P-Value"}];
Grid[{Headers,
Sequence @@
MapThread[
Flatten[Join[{#1}, {#2}]] &, {AllTests, TableData}]},
Alignment -> Left,
Background -> {Automatic, pos -> GrayLevel@0.9},
Dividers -> {2 -> True, 2 -> True},
FrameStyle -> Directive[GrayLevel@.7]] // Text]];


Unprotect@VarianceTest;
PrependTo[DownValues@VarianceTest,
DownValues[AVarianceTest] /. AVarianceTest -> VarianceTest];
Protect@VarianceTest;

ADistributionFitTest[sample_, x_, {tableType_, All},
y___ : OptionsPattern] /;
TrueQ@("HighlightAutomaticTest" /. {y}) :=

With[{\[ScriptCapitalH]2 =

DistributionFitTest[sample, x, "HypothesisTestData",
FilterRules[{y}, Except["HighlightAutomaticTest"]]]},
Module[
{AT = \[ScriptCapitalH]2["AutomaticTest"],
AllTests = \[ScriptCapitalH]2["AllTests"],
TableDataType = StringDrop[tableType, -5],
(* Dropping "Table" *)
pos, Headers, grid, TableData
},
TableData = \[ScriptCapitalH]2[{TableDataType, All}];

pos = 1 + Position[AllTests, AT][[1, 1]];
Headers = Switch[TableDataType,
"TestData", {" ", "Statistic", "P-Value"},
"TestStatistic", {" ", "Statistic"},
"PValue", {" ", "P-Value"}
];
Grid[{
Headers,
Sequence @@
MapThread[Flatten[Join[{#1}, {#2}]] &, {AllTests, TableData}]

}, Alignment -> Left,
Background -> {Automatic, pos -> GrayLevel@0.9},
Dividers -> {2 -> True, 2 -> True},
FrameStyle -> Directive[GrayLevel@.7]] // Text
]];

Unprotect@DistributionFitTest;
PrependTo[DownValues@DistributionFitTest,
DownValues[ADistributionFitTest] /.
ADistributionFitTest -> DistributionFitTest];

Protect@DistributionFitTest;


SetAttributes[SignificanceLevelThresholds, HoldFirst];
SetAttributes[ShowSignificanceLevelThresholds, HoldFirst];

SignificanceLevelThresholds[test_[A__]] :=
With[{firstTest = test[A, SignificanceLevel -> 0.0001]},
Last /@
Split[Table[{i, test[A, SignificanceLevel -> i]}, {i, 0.001, .999,

0.001}], Last@#1 === Last@#2 &]];

ShowSignificanceLevelThresholds[test_[A__]] :=
Module[{NForm},
NForm[n_] := If[MemberQ[{0, 1}, n], n, NumberForm[n, {4, 3}]];
With[{SLT = SignificanceLevelThresholds[test[A]]},
With[{intervals =
StringForm[
"`1` < \[FilledSquare] < `2`", #[[1]] // NForm, #[[2]] //
NForm] & /@

Partition[Prepend[SLT[[All, 1]] /. .999 -> 1, 0], 2, 1],
TestOutputs = SLT[[All, 2]]},
Grid[{{HoldForm@
test[A, SignificanceLevel -> \[SelectionPlaceholder]] //
Style[#, Bold] &,
Sequence @@ ConstantArray[SpanFromLeft, Length@SLT - 1]},
intervals, TestOutputs},
Spacings -> {2, {1 -> 2, 2 -> 2, 3 -> 1, 4 -> 1}},
Background -> {None, 2 -> LightBlue},
ItemStyle -> {None, 2 -> Directive[15]},

Dividers -> {{True, True, 3, 4}, {True, True, False, True}},
ItemSize -> 16, Frame -> True]]]]

Answer



With a single sample the test for normality is the deciding factor in what test to choose.


The key here is that DistributionFitTest is not testing against NormalDistribution[0,1] by default. It is testing against the family of normal distributions.


DistributionFitTest[sample]

(*0.0307822*)

DistributionFitTest[sample, NormalDistribution[]]


(*0.00836514*)

The T test is chosen until the significance level is greater than the p-value for the test for normality.


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...