How to perform link prediction in text based relationship data

I need to establish if there is a link between 2 columns from two different datasets with one matching column, where;

Dataset1: bipartite:  (M, DS)  
M    G 
m23  ds3 
m23  ds67  
m54  ds325  
...  ...    

Dataset2: tripartite: (M, G, DG)  
M    G    DG  
m23  g6   dg32
m23  g8   dg1 
m54  g32  dg65
...  ...   ...  

These 2 datasets have one column in common(i.e., M), and the relationship among the elements is shown below:

M  ----affects---- G  
M  ----causes----- DS  
DG ----affects---- M  

Primary Goal: To calculate the probability of a possible link/edge that might exist between indirectly related columns(eg. DG and DS) via the common column(M).

So, for a given list of DS entries, how to find the probability of the existence of a link/edge between selected DS, and all the other DGs

DS ---- ---- DG

If DS; (ds3, ds67) were selected, the output should be like this:

element1 - element2 - probability/statistical value to signify the existence of direct relationship OR link.

ds3 - dg32  - 100% (common M value)
ds3 - dg1  - 100%  (common M value)
ds3 - dg65 - 43.66%
---
ds67 - dg32 - 100% (common M value)
ds67 - dg1  - 100% (common M value)
ds67 - dg65 - 55.12%

I am trying to code this in Java, but Python based solutions can work too.

I am sorry I am not too familiar with graph theory, a little descriptive solutions would be really appreciated. Thanks.

Topic graphical-model probability python statistics

Category Data Science


You are describing building a Probabilistic Graphical Model (PGM).

The most commonly used Python library to build a PGM is pgmpy.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.