I have data from which information has been irretrievably lost through binning and I am trying to find a plausible distribution from which it derives. There is an academic paper suggesting that the underlying data is Zipf or Pareto distributed, but I believe that the paper is wrong. Just because one can fit a line to the log-log transform of survival data does not mean that the data is really Zipf or Pareto distributed!
The data (which you can see here, table 2b) is in an $n\times 2$ table in which the first column is a discrete range defining a "bin" and the second relevant column is an integer count. It thus states that there are y1 firms that have between min1 and max1 employees, that there are y2 firms that have between min2 and max2 employees, etc. I thus know that there are y1+y2+ ...+ yn firms that have at least min1 employees, that there are y2+y3 + ... + yn firms that have at least min2 employees, etc.
I would, of course, love to be able to use EstimatedDistribution
with various guesses as to functional form and then use DistributionFitTest
to see if the guess was plausible. But I can not figure out how to get Mathematica to find the best parameters for a distribution where all I have is binned data. I've floundered about with SurvivalModelFit
, WeightedData
, all to little avail. All help appreciated.
Comments
Post a Comment