Dirichlet process mixture model as a cognitive model of category learning

In cognitive science/psychology, Dirichlet process mixture model (DPMM) is considered as a rational model of category learning (Anderson, 1991; Sanborn, Griffiths & Navarro, 2010). That is, the DPMM is used to approximate how human learns to categorise objects.

In this post, I review the DPMM, assuming that all the features are discrete. The implementation in C++ can be found here.

Suppose a learner has observed $n - 1$ objects $\{x_{1}, x_{2}, \dots, x_{n - 1}\}$ with corresponding category labels $\{y_{1}, y_{2}, \dots, y_{n - 1}\}$. In the DPMM, each of these objects fits into a cluster. The cluster label for the $i$th object is denoted as $z_{i}$.

Drawing an inference

Then, the probability that the $n$th object fits into category $w$ is expressed as follows:

[ \begin{align} p(y_{n} = w \; \vert \; x_{n}) &= \sum_{k \in \mathbb{Z}} p(z_{n} = k \; \vert \; x_{n}) \; p(y_{n} = w \; \vert \; z_{n} = k) \\
&= \sum_{k \in \mathbb{Z}} \; \frac{p(z_{n} = k) \; p(x_{n} \;\vert\; z_{n} = k)}{p(x_{n})} \; p(y_{n} = w \; \vert \; z_{n} = k) \\
&= \sum_{k \in \mathbb{Z}} \; \frac{p(z_{n} = k) \; p(x_{n} \;\vert\; z_{n} = k)}{\sum_{s \in \mathbb{Z}} \; p(z_{n} = s) \; p(x_{n} \;\vert\; z_{n} = s)} \; p(y_{n} = w \; \vert \; z_{n} = k). \label{eqn:inference} \tag{1} \end{align} ]

Here, $\mathbb{Z}$ is a set of all the possible clusters to which the $n$th object can be assigned. The three terms in Equation $\ref{eqn:inference}$ are described below in turn.

First, the probability that the $n$th object fits into a cluster $k$ is given by: [ p(z_{n} = k) = \begin{cases} \frac{c\,m_{k}}{(1 - c) + c\,(n - 1)} & \text{if $m_{k} > 0$}\\
\\
\frac{(1 - c)}{(1 - c) + c\,(n - 1)} & \text{if $m_{k} = 0$} \end{cases} \label{eqn:prior}\tag{2} ] where $c$ is a parameter called the coupling probability and $m_{k}$ is the number of objects assigned into cluster $k$.

In typical experiments in cognitive science/psychology, each feature is independent. Therefore, [ p(x_{n} \; \vert \; z_{n} = k) = \prod_{d \in D} p(x_{n,d} \; \vert \; z_{n} = k), \label{eqn:feature1}\tag{3} ] where $D$ is a set of features that describe $x$. The above term is computed with [ p(x_{n,d} = v \; \vert \; z = k) = \frac{B_{v,d} + \beta_{c}}{m_{k} + J_{d} \, \beta_{c}}, \label{eqn:feature2}\tag{4} ] where $B_{v,d}$ is the number of objects in cluster $k$ with value of $v$ on feature $d$, and $J_{d}$ is the number of values which an object can take on feature $d$.

Similarly, the probability that the $n$th object has category label $w$, given a cluster, is given by: [ p(y_{n} = w \; \vert \; z = k) = \frac{B_{w} + \beta_{l}}{B_{.} + J \, \beta_{l}}, \label{eqn:label}\tag{5} ] where $B_{w}$ is the number of observed objects with category label $w$ in cluster $k$, $B_{.}$ is the number of objects in cluster $k$, and $J$ is the number of possible category labels.

Learning

The learning in the DPMM is equivalent to assigning an object into a cluster. The cluster assignment of the $n$th object, for example, is in accordance with the probability of each cluster. Specifically, an object is assigned to a cluster with the probability: [ p(z_{n} = k \;\vert\; x_{n},\, y_{n}) \propto p(z_{n} = k) \; p(x_{n} \;\vert\; z_{n} = k) \; p(y_{n} \;\vert\; z_{n} = k) \label{eqn:learning}\tag{6} ] This is computed with Equations $\ref{eqn:prior}$, $\ref{eqn:feature1}$ and $\ref{eqn:label}$.

This learning is mathematically intractable and approximated with local MAP (Anderson, 1991) and particle filter (Sanborn et al., 2010).

References

Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409-29.

Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (2010). Rational approximations to rational models: alternative algorithms for category learning. Psychological Review, 117, 1144-1167.