Skip to main content

probability or statistics - Problems encoding a Bayesian Network with just five nodes using ProbabilityDistribution


Question summary


I had recently asked this question where problems encoding a Bayesian Network were linked to the use of MultinomialDistribution. While that problem can be avoided using EmpiricalDistribution, there remains an issue with using ProbabilityDistribution for larger networks as it seems: While Probability can be used for inference with 4 nodes, it will not evaluate for the "full" example network of 5 nodes -- which still is far removed from real application demands. Why is this so? What can be done about it?


Bayesian Network Example


Again I would like to use the (simple) example that is given on page 53 in Probabilistic Graphical Models (2009), by Daphne Koller and Neir Friedman:


BayesianNetwork



The network has five nodes (random variables):



  • Difficulty of a class taken by a student (0 = easy, 1 = hard)

  • Intelligence of the student (0 = low, 1 = high)

  • Grade achieved by the student (1 = A, 2 = B, 3 = C)

  • SAT score of the student (0 = low, 1 = high)

  • Letter of recommendation by the teacher (0 = False, 1 = True)


We would like to use this network to do probabilistic inference (causal or evidential) like: "What is the probability of the student achieving an A, given that he is intelligent?"


Encoding the Bayesian Network in Mathematica



Essentially the Bayesian Network is a sparse way to define the joint probability distribution function for the random variables using the chain rule of probability theory:


$ \begin{align} P(I,D,G,S,L) = P(I) \times P(D) \times P(G|I,D) \times P(S|I) \times P(L|G) \end{align} $


I am encoding this in Mathematica as follows:


(* nodes without parents *)
distI = BernoulliDistribution[ 0.3 ]; (* prior probability of high intelligence *)
distD = BernoulliDistribution[ 0.4 ]; (* prior probability of hard class *)

(* nodes with parents = conditional probability distributions *)
(* conditional distribution of the grade *)
cpdG = Function[ { i, d },

With[
{
p = Piecewise[
{
{ { 0.3, 0.4, 0.3 }, i == 0 && d == 0 },
{ { 0.05, 0.25, 0.7 }, i == 0 && d == 1 },
{ { 0.9, 0.08, 0.02 }, i == 1 && d == 0 },
{ { 0.5, 0.3, 0.2 }, i == 1 && d == 1 }
}
]

},
EmpiricalDistribution[ p -> Range[3] ]
]
];

(* conditional distribution for the SAT score *)
cpdS = Function[ i,
With[
{
θ = Piecewise[

{
{ 0.05, i == 0 },
{ 0.8, i == 1 }
}
] (* probability of a high SAT score *)
},
BernoulliDistribution[θ]
]
];


(* conditional probability function for the Letter *)
cpdL = Function[ g,
With[
{
θ = Piecewise[
{
{ 0.9, g == 1 },
{ 0.6, g == 2 },
{ 0.01, g == 3 }
}

]
},
BernoulliDistribution[θ]
]
];

(* BayesNetwork = Joint Probability Distribution Function *)
(* B4 = P(I,D,G,L) *)
distB4 = ProbabilityDistribution[
PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdL[g], l],

{i, 0, 1, 1},
{d, 0, 1, 1},
{g, 1, 3, 1},
{l, 0, 1, 1}
];

(* B5 = P(I,D,G,S,L) *)
distB5 = ProbabilityDistribution[
PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdS[i], s] PDF[ cpdL[g], l],
{i, 0, 1, 1},

{d, 0, 1, 1},
{g, 1, 3, 1},
{s, 0, 1, 1},
{l, 0, 1, 1}
];

Doing Inference


Now we would like to ask the question as stated above:


Probability[ g == 1 \[Conditioned] i == 1, {i,d,g,l} \[Distributed] distB4 ]



0.74



Probability[ g == 1 \[Conditioned] i == 1, {i,d,g,s,l} \[Distributed] distB5 ]


Probability[ ] is returned unevaluted.



Why is this the case? What can be done about it - after all 5 nodes should not be too far a stretch?



Answer





As indicated in the answer given by WRI here, the interplay of Piecewise and ProbabilityDistribution is tricky and -- so my temporary verdict -- is best avoided.


Indeed, using indicator functions, e.g. Boole, as a replacement for Piecewise solves the issue:


(* nodes without parents remain unchanged *)
(* CPDs are redefined using Boole instead of Piecewise *)

(* conditional distribution of the grade *)
cpdG = Function[ {i,d},
With[
{

p = Plus[
{0.3 , 0.4 , 0.3 } Boole[ i == 0 && d == 0 ],
{0.05, 0.25, 0.7 } Boole[ i == 0 && d == 1 ],
{0.9 , 0.08, 0.02} Boole[ i == 1 && d == 0 ],
{0.5 , 0.3 , 0.2 } Boole[ i == 1 && d == 1 ]
]
},
EmpiricalDistribution[ p -> Range[3] ]
]
];


(*conditional distribution for the SAT score*)
cpdS = Function[ i,
With[
{
θ = Plus[
0.05 Boole[ i == 0 ],
0.8 Boole[ i == 1 ]
] (*probability of a high SAT score*)
},

BernoulliDistribution[θ]
]
];

(*conditional probability function for the Letter*)
cpdL = Function[ g,
With[
{
θ = Plus[
0.9 Boole[ g == 1 ],

0.6 Boole[ g == 2 ],
0.01 Boole[ g == 3 ]
]
},
BernoulliDistribution[θ]
]
];

(* B5 = P(I,D,G,S,L) complete BN *)
distB5 = ProbabilityDistribution[

PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdS[i], s] PDF[ cpdL[g], l],
{i, 0, 1, 1}, {d, 0, 1, 1}, {g, 1, 3, 1}, {s, 0, 1, 1}, {l, 0, 1, 1}
];

Now doing inference for the complete joint probability distribution as specified by the Bayesian Network works out fine:


Probability[ g == 1\[Conditioned] i == 1, {i,d,g,s,l} \[Distributed] distB5 ]


0.74




Comments