Imagine I have a set of data like the following:
data = {{0.0453803, 0.0427863, 0.0489815, 0.045243, 0.0488289, 0.0432898,
0.04448, 0.0387732, 0.0388952}, {0.0507668, 0.0427863, 0.0502632,
0.0503395, 0.0634623, 0.0675822, 0.0529335, 0.047425,
0.0387121}, {0.042237, 0.0501259, 0.0595712, 0.0869001, 0.139559,
0.141512, 0.0868391, 0.0579232, 0.0408331}, {0.0478981, 0.0491646,
0.0652628, 0.130404, 0.218448, 0.220645, 0.143603, 0.0605173,
0.0424964}, {0.0462043, 0.0530861, 0.076051, 0.140017, 0.206943,
0.202502, 0.118791, 0.0614023, 0.0459907}, {0.0511788, 0.0582132,
0.105531, 0.166354, 0.181003, 0.13698, 0.0748302, 0.0557107,
0.0492103}, {0.0493629, 0.0539712, 0.0971695, 0.160769, 0.164477,
0.104768, 0.0591745, 0.0475319, 0.0452583}, {0.0510719, 0.0599374,
0.0730602, 0.0975814, 0.101289, 0.0691997, 0.0498054, 0.044892,
0.043122}, {0.0460517, 0.0567025, 0.0574044, 0.0587778, 0.0537118,
0.0487221, 0.0474098, 0.0413977, 0.04477}}
How might I best fit a Gaussian curve to this set of datapoints, and extract properties such at the fit' semi-axes? I noticed that ComponentMeasurements
has some functionality for best fit ellipsoids, but that doesn't seem to be workable here.
My objective here is to determine how "Gaussian" a set of points in an image are. My strategy is to sequentially fit a 2D Gaussian to each point, and then to measure it's eccentricity and spread (looking, for example, at the length and ratio of the semiaxes of the ellipsoid corresponding to the fit). The example here seems like it should yield a 2D Gaussian fit with significant spread and a ratio for the semiaxes significantly diverging from one.
Answer
Sjoerd's answer applies the power of Mathematica's very general model fitting tools. Here's a more low-tech solution.
If you want to fit a Gaussian distribution to a dataset, you can just find its mean and covariance matrix, and the Gaussian you want is the one with the same parameters. For Gaussians this is actually the optimal fit in the sense of being the maximum likelihood estimator -- for other distributions this may not work equally well, so there you will want NonlinearModelFit
.
One wrinkle is that your data doesn't fall off to zero but to something like Min[data]
$\approx 0.0387$, but we can easily get rid of that by just subtracting it off. Next, I normalize the array to sum to $1$, so that I can treat it like a discrete probability distribution. (All this really does is allow me to avoid dividing by Total[data, Infinity]
every time in the following.)
min = Min[data];
sum = Total[data - min, Infinity];
p = (data - min)/sum;
Now we find the mean and covariance. Mathematica's functions don't seem to work with weighted data, so we'll just roll our own.
{m, n} = Dimensions[p];
mean = Sum[{i, j} p[[i, j]], {i, 1, m}, {j, 1, n}];
cov = Sum[Outer[Times, {i, j} - mean, {i, j} - mean] p[[i, j]], {i, 1, m}, {j, 1, m}];
We can easily get the probability distribution for the Gaussian with this mean and covariance. Of course, if we want to match the original data, we have to rescale and shift it back.
f[i_, j_] := PDF[MultinormalDistribution[mean, cov], {i, j}] // Evaluate;
g[i_, j_] := f[i, j] sum + min;
The match is not too bad, although the data has two humps where the Gaussian has one. You can also compare the fit through a contour plot. (Use the tooltips to compare contour levels.)
Show[ListPlot3D[data, PlotStyle -> None],
Plot3D[g[i, j], {j, 1, 9}, {i, 1, 9}, MeshStyle -> None,
PlotStyle -> Opacity[0.8]], PlotRange -> All]
Show[ListContourPlot[data, Contours -> Range[0.05, 0.25, 0.025],
ContourShading -> None, ContourStyle -> ColorData[1, 1], InterpolationOrder -> 3],
ContourPlot[g[i, j], {j, 1, 9}, {i, 1, 9}, Contours -> Range[0.05, 0.25, 0.025],
ContourShading -> None, ContourStyle -> ColorData[1, 2]]]
The variances along the principal axes are the eigenvalues of the covariance matrix, and the standard deviations (which I guess you're calling the semi-axes) are their square roots.
Sqrt@Eigenvalues[cov]
(* {1.86325, 1.50567} *)
Comments
Post a Comment