Joint probability as a tensor
In this post, I consider joint probability of discrete random variables as tensors.
Let us consider three discrete random variables: $A \in \{1, 2, \dots, k_{A} \}$, $B \in \{1, 2, \dots, k_{B} \}$, and $C \in \{1, 2, \dots, k_{C}\}$. And their joint probability is given by $p(A, B, C) = p(A) \, p(B) \, p(C \vert A, B)$.
I parameterise probabilitities, such that $p(A)$ with a column vector with multinomial probability: $p(A)_{i} > 0 \, \forall \, i$ and $\sum_{i} p(A)_{i} = 1$. Here, $p(A)_{i}$ is the probability that $A = i$. The same goes for $p(B)$ and $p(C )$. So, $p(A) \in \mathbb{R}^{k_{A} \times 1}$, $p(B) \in \mathbb{R}^{k_{B} \times 1}$, and $p( C ) \in \mathbb{R}^{k_{C} \times 1}$.
As $A$ and $B$ are independent, the joint probability of $A$ and $B$ is given by [ p(A, B) = p(A) \, p(B)^{T} = p(A) \otimes p(B), ] where $\otimes$ indicate a tensor product. This joint probability, $p(A, B)$, is a $k_{A}$ by $k_{B}$ matrix. The value on the $i$th row $j$th column tells us the probability that $A = i$ and $B = j$.
For convenience, we expand the dimensions of $p(A, B)$, such that $p^{*}(A, B) \in \mathbb{R}^{k_{A} \times k_{B} \times 1}$. Then with tensor broadcasting, the joint distribution of the three variables is given by [ p(A, B, C) = p^{*}(A, B) \, p(C \vert A, B), ] where $p(C \vert A, B) \in \mathbb{R}^{k_{A} \times k_{B} \times k_{C}}$ is the conditional probability table. The element $[a, b, c]$ in $p(C \vert A, B)$ tells us $p(C=c \vert A=a, B=b)$.
Then we can derive the marginal probability of $C$ by integrating out $A$ and $B$: [ p(C ) = \sum_{a} \sum_{b} p(A=a, B=b, C). ] Similarly, we can compute other joint probabilities. The joint probability of $A$ and $C$, for example, is given by [ p(A, C) = \sum_{b} p(A, B=b, C). ]
To finish off, let’s look at the implementation in Python.
import numpy
# randomly generate p(A)
kA = 2
pA = numpy.random.dirichlet([1] * kA).reshape((kA, 1))
# also p(B)
kB = 3
pB = numpy.random.dirichlet([1] * kB).reshape((kB, 1))
# and p(C | A, B)
kC = 4
pCgivenAB = numpy.array([
numpy.random.dirichlet([1] * kC) for i in range(kA * kB)
]).reshape((kA, kB, kC))
# joint distribution of two independent variables is given by outer product
pAB = numpy.outer(pA, pB)
# reshape pAB to enable broadcasting
pABreshaped = pAB.reshape((kA, kB, 1))
# finally calculate the full joint distribution
pABC = pABreshaped * pCgivenAB
# axis 0, 1, 2 in pABC corresponds to A, B, and C respectively.
# So summing out will give us marginal probability.
numpy.testing.assert_almost_equal(pABC.sum(axis=(1, 2)), pA.squeeze())
numpy.testing.assert_almost_equal(pABC.sum(axis=(0, 2)), pB.squeeze())
pC = pABC.sum(axis=(0, 1))
# Similarly, we can calculate the joint probability of two variables.
numpy.testing.assert_almost_equal(pABC.sum(axis=2), pAB.squeeze())
pBC = pABC.sum(axis=0)
pAC = pABC.sum(axis=1)