Why does a belief network need to be represented using a directed acyclic graph (DAG)?

I would have thought that it's because DAGs preserve the dependency relationships between the variables, but I am currently unsure.

Thanks

Topic graphical-model probability machine-learning

Category Data Science


A Belief (Bayesian) network is "defined" to be a DAG. A better question would be

Why a distribution needs to be represented by a Belief network, i.e. a DAG?

Dependency relationships can be modeled with both directed-acyclic and undirected-cyclic representations. Therefore, a distribution may be represented in either ways.

Why acyclic and directed?

Every distribution that can be sequentially factorized into conditional probabilities as follows can be represented by a DAG:

$P(X_1,...,X_n)=P(X_n|X_1,...,X_{n-1})\color{blue}{P(X_1,...,X_{n-1})}$,
$\color{blue}{P(X_1,...,X_{n-1})}=P(X_{n-1}|X_1,...,X_{n-2})P(X_1,...,X_{n-2})$,
...,
$P(X_1, X_2, X_3)=P(X_3|X_1, X_2)\color{brown}{P(X_1, X_2)}$,
$\color{brown}{P(X_1, X_2)}=P(X_2|X_1)P(X_1)$.

Belief network $G$ is built as follows: $X_n$ has no outlink, $X_{n-1}$ links to $X_n$, $X_{n-2}$ links to $X_{n-1}$ and $X_{n}$, and finally $X_1$ links to all $X_2$ to $X_n$.

Generally, there is no unique order for factorization, thus multiple networks can represent the same distribution. For example $P(A, B)$ can be factorized as $P(A|B)P(B)$ or $P(B|A)P(A)$ which are represented with two different Bayesian networks.

Every Bayesian network represents a sub-graph of $G$. Some directed links from $G$ are removed to introduce "independence assumptions between variables".

How about cyclic and undirected?

However, there is also Markov networks. Some distributions can be represented by Markov networks but not Bayesian networks [Example 3.6, Page 82, Probabilistic Graphical Models Principles and Techniques - Daphne Koller]. These networks use undirected lines to represent dependencies between random variables. A line between A and B represents [part of] a factor (a function).

For example, $\phi_1(A, B)=A.B/2=B.A/2$ is a factor from factorization $$P_{\theta}(A, B, C, D) = \frac{1}{Z(\theta)}\phi_1(A, B)\phi_2(B, C, D).$$ Note that we cannot unambiguously assign a direction to relation $(A, B)$, since both directions are justified for $\phi_1(A, B)$.

As an example for cyclic relations, factor $\phi(B, C, D) = B.(C+D)$ would be represented with a triangle among $B$, $C$, and $D$.

Generally speaking, cycles tend to complicate the learning and inference on Markov networks, so fewer cycles is favorable.


Yes, we use DAGs to represent dependency relationships.

We need directed graph because condition probability P(A|B) is not same as P(B|A).

We assume it to be acyclic to get certain properties and ease calculations. You can create cyclic graph and you are going to get a lot of contradictions.

If you ease both the restriction(directed and acyclic), we get more general model called Markov Network.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.