Skip to main content

graphics - How to draw confidence ellipse from a covariance matrix?


I am studying a two-dimensional dataset, whose mean vector and covariance matrix are the following:


mean = {0.968479, 0.020717}
cov = {{0.0000797131, 0.000069929},{0.0000699293, 0.00174702}}


I want to generate a contour plot of the 95% confidence ellipse.



Answer



The executive summary


You can use the built-in Ellipsoid function directly with your calculated mean and covariance. For 95% confidence, use:


Ellipsoid[mean, cov Quantile[ChiSquareDistribution[2], 0.95]]

That expression returns an Ellipsoid object that you can visualize as an Epilog to a ListPlot, or as an argument to Graphics (further formatting below).


Ellipsoids for other common critical values can be obtained in the same way. Note the different multipliers to cov:


90% :   Ellipsoid[mean, cov Quantile[ChiSquareDistribution[2], 0.9]]

95% : Ellipsoid[mean, cov Quantile[ChiSquareDistribution[2], 0.95]]
99% : Ellipsoid[mean, cov Quantile[ChiSquareDistribution[2], 0.99]]

A more detailed answer


First let me start by mentioning that a covariance matrix must be symmetric. You have a missing digit in cov[[1,2]] that makes your orignal one non-symmetric; I assume that it's a typo and will use the symmetric version below:


 mean = {0.968479, 0.020717};
cov = {{0.0000797131, 0.000069929}, {0.0000699293, 0.00174702}}

The easiest way to generate an ellipsoid with the right location and alignment given your distribution is to feed the mean and covariance directly to the Ellipsoid function, simply as Ellipsoid[mean, cov]. The resulting Ellipsoid is a graphical primitive, so it can be plotted on top of your data using e.g. Epilog or Graphics.


To get a practical example, let us generate and plot some random points from your distribution, assuming that it is normal.



 SeedRandom[1];
sampledata = RandomVariate[MultinormalDistribution[mean, cov], 2500];

ListPlot[
sampledata,
PlotRange -> All, PlotRangePadding -> Scaled[0.05],
AspectRatio -> 1, Axes -> None, Frame -> True,
Epilog -> {Opacity[0], EdgeForm[{Thick, Red}], Ellipsoid[mean, cov]}
]


Data and single-wide ellipsoid


As you can see, however, an Ellipsoid that is "one covariance wide", as the one we plotted, contains only a small fraction of the sampled points (only roughly 40% of the points, see below). Instead you requested an ellipsoid containing 95% of the points from your distribution. We need a wider Ellipsoid for that: but how wide?


We can figure that out by using the probability distribution function (PDF) for your multivariate distribution. We can integrate the PDF over a parametric ellipsoidal region to calculate what fraction of the samples falls within that region. Let's consider a two-dimensional MultinormalDistribution, with zero average and covariance expressed by {{sigmax^2, rho sigmax sigmay}, {rho sigmax sigmay, sigmay^2}}, where sigmax and sigmay are the standard deviations associated with each of the two independent variables, and rho is the correlation coefficient between the two variables. The standard deviations are positive numbers, and 0 <= rho <= 1.


Here we calculate an expression for the fraction of points found within a two-dimensional ellipse centered around zero (the mean of this distribution) and "n covariances wide" (notice the n factor in the Ellipsoid's descriptor. The integration is carried out over the region defined by that "n-wide" ellipse.


gencovar = {{sigmax^2, rho sigmax sigmay}, {rho sigmax sigmay, sigmay^2}};

Assuming[
{n > 0, sigmax > 0, sigmay > 0, 0 <= rho < 1},
Simplify[
Integrate[

PDF[MultinormalDistribution[{0, 0}, gencovar], {x, y}],
{x, y} \[Element] Ellipsoid[{0, 0}, n gencovar]
]
]
]

(* 1-E^(-n/2) *)

Now let's tabulate the value of that expression for a few n.


 TableForm[#, TableAlignments -> {Right, Top}] &@

Table[{ToString@n <> "x cov", ToString@Round[100 %, 1] <> "%"}, {n, 1, 9, 1}]

(*
1x cov 39%
2x cov 63%
3x cov 78%
4x cov 86%
5x cov 92%
6x cov 95%
7x cov 97%

8x cov 98%
9x cov 99% *)

This means that our original "single-wide" Ellipsoid contained only 39% of the samples; to get 95% inclusion we need a 6x wide Ellipsoid.


Let's plot that for your original distribution (notice the all-important 6x factor in the Ellipsoid definition):


 ListPlot[
sampledata,
PlotRange -> All, PlotRangePadding -> Scaled[0.05],
AspectRatio -> 1, Axes -> None, Frame -> True,
Epilog -> {Opacity[0], EdgeForm[{Thick, Red}], Ellipsoid[mean, 6 cov]}

]

Data with 95% confidence ellipsoid


Finally, we can also confirm this by explicitly counting the samples that fall within this Ellipsoid. The expression Select[sampledata, RegionMember[Ellipsoid[mean, n cov]]] allows us to select those samples in sampledata that lie within the geometric region defined by the "6x wide" Ellipsoid[mean, 6 cov].


 N@Length@Select[sampledata, RegionMember[Ellipsoid[mean, 6 cov]]] / Length@sampledata

(* 0.9544 *)

As expected from our previous calculations, approximately 95% of the points reside within the 6x Ellipsoid we defined.


For clarity, Length@Select[sampledata, RegionMember[Ellipsoid[mean, n cov]]] is the number of points lying within the Ellipsoid region. Length@sampledata is the total number of points in our sample. N is there to obtain an approximate numerical answer, rather than a symbolic one.



Why not use EllipsoidQuantile?


EllipsoidQuantile is a function available in the MultivariateStatistics package that was mentioned by @Michael E2 in a comment as a possible solution to this problem. EllipsoidQuantile[dataset, q] returns an Ellipsoid centered on the mean of dataset and scaled to contain a fraction q of the dataset.


At a glance, this would seem exactly what the OP asked, but this function behaves in a subtly different way. In my understanding, it treats dataset as the entire population, rather than as a sample from a larger population. If the sample available is large and it represents the population well, then the results of the two methods will be essentially indistinguishable. However, the two methods will give noticeably different results for smaller samples and high levels of confidence q.


I also have two more practical reasons not to use the EllipsoidQuantile function. First, it is difficult to apply styles to its output, although I have never been able to pinpoint why exactly. Additionally, some definitions in MultivariateStatistics seem to shadow definitions of functions that have since transitioned to built-in status, e.g. PrincipalComponents and MultinormalDistribution, so I'd rather not load the package unless it's absolutely necessary.


Here is some Manipulate code that allows one to compare the two results on the sampledata I generated above, and an example of when the two approaches differ considerably.


Needs["MultivariateStatistics`"]

Manipulate[
Show[{
ListPlot[

sampledata[[1 ;; ;; every]],
Axes -> None, Frame -> True,
Epilog ->
Inset[Style["alpha = " <> ToString@alpha, FontSize -> 14], Scaled[{0.85, 0.9}]]
],
Graphics[{
(* using EllipsoidQuantile *)
EllipsoidQuantile[sampledata[[1 ;; ;; every]], alpha],
(* Using the Ellipsoid method outlined above *)
{Opacity[0], EdgeForm[{Thick, Red}],

Ellipsoid[
Mean@sampledata[[1 ;; ;; every]],
Evaluate[n /. First@Quiet@Solve[1 - E^(-n/2) == alpha, n]] Covariance@sampledata[[1 ;; ;; every]]
]
}
}]
}, PlotRange -> {{0.93, 1.01}, {-0.17, 0.24}}
],

(* Manipulate variables *)

{{alpha, 0.95, "\[Alpha] value"}, 0.9, 0.99, 0.01, Appearance -> "Open"},
{{every, 100, "points"}, {1 -> "All", 10 -> "250", 20 -> "125", 50 -> "50", 100 -> "25", 250 -> "10"}, ControlType -> SetterBar}
]

A sample output highlights the difference: in red is the Ellipsoid result, and in black the EllipsoidQuantile output:


Manipulate to compare Ellipsoid and EllipsoidQuantile


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...