Belief propagation

Posted on 12 Aug 2018

Belief propagation, also known as sum-product algorithm, is an inference framework. In the context of Bayesian networks, this algorithm is often used to compute the marginal probability distribution. In this post, I walk through the computations involved in the belief propagation.

In the belief propagation, a message is passed from a node to a factor and from a factor to a node. The message passed from node $x_{i}$ to factor $f_{j}$ is the product of messages $x_{i}$ receives from other factors: [ \mu_{x_{i} \rightarrow f_{j}} = \prod_{\substack{k \in ne(x_{i}) \\\ k \neq j}} \mu_{f_{k} \rightarrow x_{i}}, ] where $ne(x_{i})$ is an index of neighbouring factors. And the message passed from factor $f_{j}$ to node $x_{i}$ is the product of messages $f_{j}$ receives from other nodes, multiplied by the factor, and summed over the other nodes than $x_{i}$: [ \mu_{f_{j} \rightarrow x_{i}} = \int \left(\prod_{\substack{k \in ne(f_{j}) \\\ k \neq i}} \mu_{x_{k} \rightarrow f_{j}} \right) \, f_{j}(x) \, d x_{l} \, _{(l = 1, \dots, n. \, l \neq i)}. ]

These equations may be opaque, so I walk through an example below.

Example

Example Bayesian network

Consider the network on the left with five nodes. The joint probability is given by [ p(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}) = p(x_{1}) \, p(x_{2}) \, p(x_{3} \vert x_{1}, x_{2}) \, p(x_{4} \vert x_{3}) \, p(x_{5} \vert x_{3}). ]

This directed network is first re-expressed as a factor graph.

Example factor graph

The factor graph on the left represents [ f_{A}(x_{1}) \, f_{B}(x_{2}) \, f_{C}(x_{1}, x_{2}, x_{3}) \, f_{D}(x_{3}, x_{4}) \, f_{E}(x_{3}, x_{5}), ] where [ \begin{align} f_{A}(x_{1}) &= p(x_{1}),\\
f_{B}(x_{2}) &= p(x_{2}),\\
f_{C}(x_{1}, x_{2}, x_{3}) &= p(x_{3} \vert x_{1}, x_{2}),\\
f_{D}(x_{3}, x_{4}) &= p(x_{4} \vert x_{3}), \quad{and}\\
f_{E}(x_{3}, x_{5}) &= p(x_{5} \vert x_{3}). \end{align} ]

Then, the marginal probability is given by, for example, [ \begin{align} p(x_{3}) &= \int f_{A}(x_{1}) \, f_{B}(x_{2}) \, f_{C}(x_{1}, x_{2}, x_{3}) \, f_{D}(x_{3}, x_{4}) \, f_{E}(x_{3}, x_{5}) \, d x_{1} x_{2} x_{4} x_{5} \\
&= \int f_{A}(x_{1}) \, f_{B}(x_{2}) \, f_{C}(x_{1}, x_{2}, x_{3}) \, d x_{1} x_{2} \, \int f_{D}(x_{3}, x_{4}) d x_{4} \, \int f_{E}(x_{3}, x_{5}) \, d x_{5}. \quad \quad (1) \end{align} ] Belief propagation computes the three integrals above step by step by passing messages from a factor to a node and from a node to a factor.

Step 1

Step1

We start with the factors connected to only one node (those $f$ which accepts only one variable as an input). The message passed from the factor $f_{A}$ to the node $x_{1}$ is given by [ \mu_{f_{A} \rightarrow x_{1}} = f_{A}(x_{1}) = p(x_{1}). ] Similarly, the message from $f_{B}$ to $x_{2}$ is [ \mu_{f_{B} \rightarrow x_{2}} = f_{B}(x_{2}) = p(x_{2}). ]

Now that we passed the message to $x_{1}$ and $x_{2}$, these messages are passed down: [ \begin{align} \mu_{x_{1} \rightarrow f_{C}} &= \mu_{f_{A} \rightarrow x_{1}} = f_{A}(x_{1}) = p(x_{1}), \quad{and}\\
\mu_{x_{2} \rightarrow f_{C}} &= \mu_{f_{B} \rightarrow x_{2}} = f_{B}(x_{2}) = p(x_{2}). \end{align} ]

The messages are further passed: [ \begin{align} \mu_{f_{C} \rightarrow x_{3}} &= \int \mu_{x_{1} \rightarrow f_{C}} \, \mu_{x_{2} \rightarrow f_{C}} \, f_{C}(x_{1}, x_{2}, x_{3}) \, d x_{1} x_{2}\\
&= \int f_{A}(x_{1}) \, f_{B}(x_{2}) \, f_{C}(x_{1}, x_{2}, x_{3}) \, d x_{1} x_{2}\\
&= \int p(x_{1}) \, p(x_{2}) \, p(x_{3} \vert x_{1}, x_{2}) \, d x_{1} x_{2}\\
&= p(x_{3}). \end{align} ] This is required to compute the marginal probability for $x_{3}$ (the first integral in Equation 1 above).

The message from $x_{3}$ to $f_{D}$ is the product of messages which $x_{3}$ received from the other factors: [ \mu_{x_{3} \rightarrow f_{D}} = \mu_{f_{C} \rightarrow x_{3}} \, \mu_{f_{E} \rightarrow x_{3}}, ] but we do not have $\mu_{f_{E} \rightarrow x_{3}}$ yet. To compute this, we move on to Step 2.

Step 2

Step2

The messages from the terminal nodes are initialized at 1. [ \begin{align} \mu_{x_{4} \rightarrow f_{D}} &= 1\\
\mu_{x_{5} \rightarrow f_{E}} &= 1. \end{align} ]

We also pass the message from $f_{D}$ and $f_{E}$: [ \begin{align} \mu_{f_{D} \rightarrow x_{3}} &= \int \mu_{x_{4} \rightarrow f_{D}} f_{D}(x_{3}, x_{4}) d x_{4} = \int f_{D}(x_{3}, x_{4}) d x_{4} = \int p(x_{4} \vert x_{3}) d x_{4} = 1, \quad{and}\\
\mu_{f_{E} \rightarrow x_{3}} &= \int \mu_{x_{5} \rightarrow f_{E}} f_{E}(x_{3}, x_{5}) d x_{5} = \int f_{E}(x_{3}, x_{5}) d x_{5} = \int p(x_{5} \vert x_{3}) d x_{5} = 1. \end{align} ] These are also required to compute the marginal probability for $x_{3}$ (the second and third integrals in Equation 1 above). That is, the marginal probability is the product of messages the node receives: [ \begin{align} p(x_{3}) &= \mu_{f_{C} \rightarrow x_{3}} \, \mu_{f_{D} \rightarrow x_{3}} \, \mu_{f_{E} \rightarrow x_{3}}. \end{align} ]

Step 3

Step3

The message passed to $f_{C}$ is the product of messages which $x_{3}$ received from other factors: [ \mu_{x_{3} \rightarrow f_{C}} = \mu_{f_{D} \rightarrow x_{3}} \, \mu_{f_{E} \rightarrow x_{3}} = 1. ] Similarly, [ \mu_{x_{3} \rightarrow f_{D}} = \mu_{f_{C} \rightarrow x_{3}} \, \mu_{f_{E} \rightarrow x_{3}} = p(x_{3}), ] and [ \mu_{x_{3} \rightarrow f_{E}} = \mu_{f_{C} \rightarrow x_{3}} \, \mu_{f_{D} \rightarrow x_{3}} = p(x_{3}). ]

The messages reach $x_{1}$ and $x_{2}$: [ \begin{align} \mu_{f_{C} \rightarrow x_{1}} &= \int \mu_{x_{3} \rightarrow f_{C}} \, \mu_{x_{2} \rightarrow f_{C}} \, f_{C}(x_{1}, x_{2}, x_{3}) \, d x_{2} x_{3} = \int p(x_{2}) p(x_{3} \vert x_{1}, x_{2}) \, d x_{2} x_{3} = 1 \quad{and}\\
\mu_{f_{C} \rightarrow x_{2}} &= \int \mu_{x_{3} \rightarrow f_{C}} \, \mu_{x_{1} \rightarrow f_{C}} \, f_{C}(x_{1}, x_{2}, x_{3}) \, d x_{2} x_{3} = \int p(x_{1}) p(x_{3} \vert x_{1}, x_{2}) \, d x_{1} x_{3} = 1. \end{align} ]

Then, the marginal probability for $x_{1}$ is the product of the messages it receives: [ p(x_{1}) = \mu_{f_{A} \rightarrow x_{1}} \, \mu_{f_{C} \rightarrow x_{1}}. ] Similarly, $x_{2}$ receives messages from factors $f_{B}$ and $f_{C}$, and thus, [ p(x_{2}) = \mu_{f_{B} \rightarrow x_{2}} \, \mu_{f_{C} \rightarrow x_{2}}. ]

The messages also reach $x_{4}$: [ \mu_{f_{D} \rightarrow x_{4}} = \int \mu_{x_{3} \rightarrow f_{D}} \, f_{D}(x_{3}, x_{4}) \, d x_{3} = \int p(x_{3}) \, p(x_{4} \vert x_{3}) \, d x_{3} = p(x_{4}). ] As $x_{4}$ receives the message only from $f_{D}$, it marginal probability is given by [ p(x_{4}) = \mu_{f_{D} \rightarrow x_{4}}. ] Similarly, [ \mu_{f_{E} \rightarrow x_{5}} = \int \mu_{x_{3} \rightarrow f_{E}} \, f_{E}(x_{3}, x_{5}) \, d x_{3} = \int p(x_{3}) \, p(x_{5} \vert x_{3}) \, d x_{3} = p(x_{5}), ] and [ p(x_{5}) = \mu_{f_{E} \rightarrow x_{5}}. ]

Summary

In the above example, the computation in Step 3 depends on the outputs from Steps 1 and 2, but the computation in Step 2 does not depend on the output from Step 1. So Steps 1 and 2 can be computed in parallel.

Looking another way, belief propagation is a procedure to build a computational graph for marginal probability and to compute it. The goal here is to compute all the messages, and it first identifies which message can be computed and then computes other messages whose computation depends on already-computed messages.

References

Kschischang, F. R., Frey, B. J., and Loeliger, H-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47, 498-519.
Bishop, C. (2006). Chapter 8 Graphical Models. In Pattern Recognition and Machine Learning. New York: Springer.

Not Much of a Blog

Bayesian Statistics, Behavioural Modeling, Machine Learning