probability or statistics - Problems encoding a Bayesian Network with just five nodes using ProbabilityDistribution

Question summary

I had recently asked this question where problems encoding a Bayesian Network were linked to the use of MultinomialDistribution. While that problem can be avoided using EmpiricalDistribution, there remains an issue with using ProbabilityDistribution for larger networks as it seems: While Probability can be used for inference with 4 nodes, it will not evaluate for the "full" example network of 5 nodes -- which still is far removed from real application demands. Why is this so? What can be done about it?

Bayesian Network Example

Again I would like to use the (simple) example that is given on page 53 in Probabilistic Graphical Models (2009), by Daphne Koller and Neir Friedman:

The network has five nodes (random variables):

Difficulty of a class taken by a student (0 = easy, 1 = hard)

Intelligence of the student (0 = low, 1 = high)

Grade achieved by the student (1 = A, 2 = B, 3 = C)

SAT score of the student (0 = low, 1 = high)

Letter of recommendation by the teacher (0 = False, 1 = True)

We would like to use this network to do probabilistic inference (causal or evidential) like: "What is the probability of the student achieving an A, given that he is intelligent?"

Encoding the Bayesian Network in Mathematica

Essentially the Bayesian Network is a sparse way to define the joint probability distribution function for the random variables using the chain rule of probability theory:

$ \begin{align} P(I,D,G,S,L) = P(I) \times P(D) \times P(G|I,D) \times P(S|I) \times P(L|G) \end{align} $

I am encoding this in Mathematica as follows:

(* nodes without parents *)
distI = BernoulliDistribution[ 0.3 ]; (* prior probability of high intelligence *)
distD = BernoulliDistribution[ 0.4 ]; (* prior probability of hard class *)

(* nodes with parents = conditional probability distributions *)
(* conditional distribution of the grade *)
cpdG = Function[ { i, d },

    With[
      {
        p = Piecewise[
                {
                  { {  0.3,  0.4,  0.3  }, i == 0 && d == 0 },
                  { {  0.05, 0.25, 0.7  }, i == 0 && d == 1 },
                  { {  0.9,  0.08, 0.02 }, i == 1 && d == 0 },
                  { {  0.5,  0.3,  0.2  }, i == 1 && d == 1 }
                }
            ]

      },
      EmpiricalDistribution[ p -> Range[3] ]
    ]
];

(* conditional distribution for the SAT score *)
cpdS = Function[ i,
   With[
    {
     θ = Piecewise[

           {
             { 0.05, i == 0 },
             { 0.8,  i == 1 }
           }
         ] (* probability of a high SAT score *)
     },
     BernoulliDistribution[θ]
   ]
];


(* conditional probability function for the Letter *)
cpdL = Function[ g,
   With[
     {
       θ = Piecewise[
             {
               { 0.9,  g == 1 },
               { 0.6,  g == 2 },
               { 0.01, g == 3 } 
             }

           ]
     },
     BernoulliDistribution[θ]
   ]
];

(* BayesNetwork = Joint Probability Distribution Function *)
(* B4 = P(I,D,G,L) *) 
distB4 = ProbabilityDistribution[
   PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdL[g], l],

   {i, 0, 1, 1},
   {d, 0, 1, 1},
   {g, 1, 3, 1},
   {l, 0, 1, 1}
];

(* B5 = P(I,D,G,S,L) *)
distB5 = ProbabilityDistribution[
   PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdS[i], s] PDF[ cpdL[g], l],
   {i, 0, 1, 1},

   {d, 0, 1, 1},
   {g, 1, 3, 1},
   {s, 0, 1, 1},
   {l, 0, 1, 1}
];

Doing Inference

Now we would like to ask the question as stated above:

Probability[ g == 1 \[Conditioned] i == 1, {i,d,g,l} \[Distributed] distB4 ]

0.74

Probability[ g == 1 \[Conditioned] i == 1, {i,d,g,s,l} \[Distributed] distB5 ]

Probability[ ] is returned unevaluted.

Why is this the case? What can be done about it - after all 5 nodes should not be too far a stretch?

Answer

As indicated in the answer given by WRI here, the interplay of Piecewise and ProbabilityDistribution is tricky and -- so my temporary verdict -- is best avoided.

Indeed, using indicator functions, e.g. Boole, as a replacement for Piecewise solves the issue:

(* nodes without parents remain unchanged *)
(* CPDs are redefined using Boole instead of Piecewise *)

(* conditional distribution of the grade *)
cpdG = Function[ {i,d},
   With[
      {

         p = Plus[
                {0.3 , 0.4 , 0.3 } Boole[ i == 0 && d == 0 ],
                {0.05, 0.25, 0.7 } Boole[ i == 0 && d == 1 ],
                {0.9 , 0.08, 0.02} Boole[ i == 1 && d == 0 ],
                {0.5 , 0.3 , 0.2 } Boole[ i == 1 && d == 1 ]
             ]
      },
      EmpiricalDistribution[ p -> Range[3] ]
   ]
];


(*conditional distribution for the SAT score*)
cpdS = Function[ i,
   With[
      {
         θ = Plus[
                0.05 Boole[ i == 0 ],
                0.8  Boole[ i == 1 ]
             ] (*probability of a high SAT score*)
      },

      BernoulliDistribution[θ]
   ]
];

(*conditional probability function for the Letter*)
cpdL = Function[ g,
   With[
      {
         θ = Plus[
                0.9  Boole[ g == 1 ],

                0.6  Boole[ g == 2 ],
                0.01 Boole[ g == 3 ]
             ]
      },
      BernoulliDistribution[θ]
   ]
];

(* B5 = P(I,D,G,S,L) complete BN *)
distB5 = ProbabilityDistribution[

   PDF[ distI, i] PDF[ distD, d] PDF[ cpdG[i,d], g] PDF[ cpdS[i], s] PDF[ cpdL[g], l],
   {i, 0, 1, 1}, {d, 0, 1, 1}, {g, 1, 3, 1}, {s, 0, 1, 1}, {l, 0, 1, 1}
];

Now doing inference for the complete joint probability distribution as specified by the Bayesian Network works out fine:

Probability[ g == 1\[Conditioned] i == 1, {i,d,g,s,l} \[Distributed] distB5 ]

0.74

Blog

Search This Blog