Bayesian network in Python: both construction and sampling

Question

Bayesian network in Python: both construction and sampling

Rutger Mauritz

2021年11月2日 03:06

For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate is a good package for Bayesian Networks, however - as far as I'm concerned - it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample() raises a NotImplementedError (despite this solution says so).

Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?

Topic bayesian-networks sampling dataset python machine-learning

Category Data Science

Sandipan Dey · Accepted Answer · 2021年1月30日 07:30

Just to elucidate the above answers with a concrete example, so that it will be helpful for someone, let's start with the following simple dataset (with 4 variables and 5 data points):

import pandas as pd
df = pd.DataFrame({'A':[0,0,0,1,0], 'B':[0,0,1,0,0], 'C':[1,1,0,0,1], 'D':[0,1,0,1,1]})
df.head()

#   A   B   C   D
#0  0   0   1   0
#1  0   0   1   1
#2  0   1   0   0
#3  1   0   0   1
#4  0   0   1   1

Now, let's learn the Bayesian Network structure from the above data using the 'exact' algorithm with pomegranate (uses DP/A* to learn the optimal BN structure), using the following code snippet:

import numpy as np
from pomegranate import *
model = BayesianNetwork.from_samples(df.to_numpy(), state_names=df.columns.values, algorithm='exact')
# model.plot()

The BN structure that is learn is shown in the next figure along with the corresponding CPTs:

As can be seen from the above figure, it explains the data exactly. We can compute the log-likelihood of the data with the model as follows:

np.sum(model.log_probability(df.to_numpy()))
# -7.253364813857112

Once the BN structure is learnt, we can sample from the BN as follows:

model.sample()  
# array([[0, 1, 0, 0]], dtype=int64)

As a side note, if we use algorithm='chow-liu' instead (which finds a tree-like structure with fast approximation), we shall obtain the following BN:

The log-likelihood of the data this time is

np.sum(model.log_probability(df.to_numpy()))
# -8.386987635761297

which indicates the algorithm exact finds better estimate.

noe · Accepted Answer · 2020年5月24日 16:25

There is an open issue in pomegranate for this in github. In the issue, they mention an ongoing pull request that implements rejections sampling and Gibbs sampling; the last comment in the PR discussion is from 7 days ago (2020, May 17th), so it is not abandoned but actively developed. You could use the version of pomegranate from that PR to sample from your Bayesian Network.

Sherin Varghese · Accepted Answer · 2020年5月24日 16:01

1

Sherin Varghese answered at 2020年5月24日 16:01

Please use the function from_samples() to build a Bayesian n/w from the data.

Bayesian network in Python: both construction and sampling

About