Skip to main content

Given an exact formula, how can Mathematica find a probability distribution whose PDF matches it?


So, given some data, Mathematica 10.2 can now attempt to figure out what probability distribution might have produced it. Cool! But suppose that, instead of having data, we have something that is in some ways better -- a formula. Let's call it f. We suspect -- perhaps because f is non-negative over some domain and because the integral of f over that domain is 1 -- that f is actually the PDF of some distribution (Normal, Lognormal, Gamma, Weibull, etc.) or some relatively simple transform of that distribution.


Is there any way that Mathematica can help figure out the distribution (or simple transform) whose PDF is the same as f?


Example: Consider the following formula:



1/(2*E^((-m + Log[5])^2/8)*Sqrt[2*Pi])

e18(log(5)m)222π


As it happens -- and as I discovered with some research and guesswork -- this formula is the PDF of NormalDistribution[Log[5], 2] evaluated at m. But is there a better way than staring or guessing to discover this fact? That is, help me write FindExactDistribution[f_, params_].


Notes




  • The motivation for the problem comes from thinking about Conjugate Prior distributions but I suspect it might have a more general application.





  • One could start with mapping PDF evaluated at m over a variety of continuous distributions. And if I did this I would at some point get to what I will call g, which is the PDF or the NormalDistribution with parameters a and b evaluated at m.


    1/(b*E^((-a + m)^2/(2*b^2))*Sqrt[2*Pi])




e(ma)22b22πb


But unless I knew that if I replaced a by Log[5] and b by 2 that I would get f, this fact would not mean a lot to me. I suppose I could look at the TreeForm of f and g and I would notice certain similarities, and that might be a hint, but I am not sure how to make much progress beyond that observation. Ultimately, the problem looks to be about finding substitutions in parts of a tree (g) which, after evaluation, yield a tree that matches a target f. I have the suspicion that this is a difficult problem with an NKS flavor but one for which Mathematica and its ability to transform expressions might be well suited.




I appreciate the responses here. But let me provide an example that is perhaps not so easy. Suppose the target function f is as follows: 710(a2)2 for the domain (,1310]. If we create a probability distribution out of this and then generate 10,000 random samples from the distribution and then run FindDistribution


 dis = ProbabilityDistribution[7/(10 (-2 + a)^2), {a, -\[Infinity], 13/10}];
rv = RandomVariate[dis,10^4];

fd=FindDistribution[rv,5]

The result is a mixture distribution of normal distributions, a beta distribution, a weibull distribution, a normal distribution and a mixture distribution of a normal distribution and a gamma distribution.


The mixture distributions are clearly of the wrong form, the normal distribution is clearly not right, Although I am not positive, I don't believe the Weibull Distribution or the Beta Distribution is correct either. In fact, I don't know what the correct answer is, though I think it might be a fairly simple transform of a single parameter distribution. The point, however, is that the FindDistribution process, does not seem to work in this case. And that's why I am hoping for something better.




Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

What is and isn't a valid variable specification for Manipulate?

I have an expression whose terms have arguments (representing subscripts), like this: myExpr = A[0] + V[1,T] I would like to put it inside a Manipulate to see its value as I move around the parameters. (The goal is eventually to plot it wrt one of the variables inside.) However, Mathematica complains when I set V[1,T] as a manipulated variable: Manipulate[Evaluate[myExpr], {A[0], 0, 1}, {V[1, T], 0, 1}] (*Manipulate::vsform: Manipulate argument {V[1,T],0,1} does not have the correct form for a variable specification. >> *) As a workaround, if I get rid of the symbol T inside the argument, it works fine: Manipulate[ Evaluate[myExpr /. T -> 15], {A[0], 0, 1}, {V[1, 15], 0, 1}] Why this behavior? Can anyone point me to the documentation that says what counts as a valid variable? And is there a way to get Manpiulate to accept an expression with a symbolic argument as a variable? Investigations I've done so far: I tried using variableQ from this answer , but it says V[1...