Skip to main content

Given an exact formula, how can Mathematica find a probability distribution whose PDF matches it?


So, given some data, Mathematica 10.2 can now attempt to figure out what probability distribution might have produced it. Cool! But suppose that, instead of having data, we have something that is in some ways better -- a formula. Let's call it $f$. We suspect -- perhaps because $f$ is non-negative over some domain and because the integral of $f$ over that domain is 1 -- that $f$ is actually the PDF of some distribution (Normal, Lognormal, Gamma, Weibull, etc.) or some relatively simple transform of that distribution.


Is there any way that Mathematica can help figure out the distribution (or simple transform) whose PDF is the same as $f$?


Example: Consider the following formula:



1/(2*E^((-m + Log[5])^2/8)*Sqrt[2*Pi])

$$\frac{e^{-\frac{1}{8} (\log (5)-m)^2}}{2 \sqrt{2 \pi }}$$


As it happens -- and as I discovered with some research and guesswork -- this formula is the PDF of NormalDistribution[Log[5], 2] evaluated at $m$. But is there a better way than staring or guessing to discover this fact? That is, help me write FindExactDistribution[f_, params_].


Notes




  • The motivation for the problem comes from thinking about Conjugate Prior distributions but I suspect it might have a more general application.





  • One could start with mapping PDF evaluated at $m$ over a variety of continuous distributions. And if I did this I would at some point get to what I will call $g$, which is the PDF or the NormalDistribution with parameters $a$ and $b$ evaluated at $m$.


    1/(b*E^((-a + m)^2/(2*b^2))*Sqrt[2*Pi])




$$\frac{e^{-\frac{(m-a)^2}{2 b^2}}}{\sqrt{2 \pi } b}$$


But unless I knew that if I replaced $a$ by Log[5] and $b$ by $2$ that I would get $f$, this fact would not mean a lot to me. I suppose I could look at the TreeForm of $f$ and $g$ and I would notice certain similarities, and that might be a hint, but I am not sure how to make much progress beyond that observation. Ultimately, the problem looks to be about finding substitutions in parts of a tree ($g$) which, after evaluation, yield a tree that matches a target $f$. I have the suspicion that this is a difficult problem with an NKS flavor but one for which Mathematica and its ability to transform expressions might be well suited.




I appreciate the responses here. But let me provide an example that is perhaps not so easy. Suppose the target function f is as follows: $\frac{7}{10 (a-2)^2}$ for the domain ($-\infty,\frac{13}{10}$]. If we create a probability distribution out of this and then generate 10,000 random samples from the distribution and then run FindDistribution


 dis = ProbabilityDistribution[7/(10 (-2 + a)^2), {a, -\[Infinity], 13/10}];
rv = RandomVariate[dis,10^4];

fd=FindDistribution[rv,5]

The result is a mixture distribution of normal distributions, a beta distribution, a weibull distribution, a normal distribution and a mixture distribution of a normal distribution and a gamma distribution.


The mixture distributions are clearly of the wrong form, the normal distribution is clearly not right, Although I am not positive, I don't believe the Weibull Distribution or the Beta Distribution is correct either. In fact, I don't know what the correct answer is, though I think it might be a fairly simple transform of a single parameter distribution. The point, however, is that the FindDistribution process, does not seem to work in this case. And that's why I am hoping for something better.




Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....