How to perform link prediction in text based relationship data
I need to establish if there is a link between 2 columns from two different datasets with one matching column, where;
Dataset1: bipartite: (M, DS)
M G
m23 ds3
m23 ds67
m54 ds325
... ...
Dataset2: tripartite: (M, G, DG)
M G DG
m23 g6 dg32
m23 g8 dg1
m54 g32 dg65
... ... ...
These 2 datasets have one column in common(i.e., M), and the relationship among the elements is shown below:
M ----affects---- G
M ----causes----- DS
DG ----affects---- M
Primary Goal: To calculate the probability of a possible link/edge that might exist between indirectly related columns(eg. DG and DS) via the common column(M).
So, for a given list of DS entries, how to find the probability of the existence of a link/edge between selected DS, and all the other DGs
DS ---- ---- DG
If DS; (ds3, ds67) were selected, the output should be like this:
element1 - element2 - probability/statistical value to signify the existence of direct relationship OR link.
ds3 - dg32 - 100% (common M value)
ds3 - dg1 - 100% (common M value)
ds3 - dg65 - 43.66%
---
ds67 - dg32 - 100% (common M value)
ds67 - dg1 - 100% (common M value)
ds67 - dg65 - 55.12%
I am trying to code this in Java, but Python based solutions can work too.
I am sorry I am not too familiar with graph theory, a little descriptive solutions would be really appreciated. Thanks.
Topic graphical-model probability python statistics
Category Data Science