Reduce size of a network graph for bipartite projection

I have a graph that I created from a pandas data frame. The length of the graph is ~450k edges. When I try to run the weighted_projected_graph function, it runs for a long time (I have not seen it finish), presumably because of the size of this data set. What is a good method for reducing the size of this data set before creating the bipartite graph?

I have tried narrowing it down by using the most connected components:

trim1 = len([c for c in net.connected_component_subgraphs(g) if len(c)  10])
C = net.Graph()
gg = sorted(net.connected_component_subgraphs(g), key=len, reverse=True)[trim1]

but I don't think this is giving me the results I want and, further, I'm not confident this is an analytically sound strategy. Does anybody have any other recommendations for reducing the size of this set?

EDIT: full code, without reducing. What I would be trying to do with the above would swap out g below for gg from above in the call to bi.projected_graph

reviews = pd.read_csv(r\BX-Book-Ratings.csv, sep=;, encoding = ISO-8859-1)
users = reviews['User-ID'].values.tolist()
books = reviews['ISBN'].values.tolist()

g=net.from_pandas_edgelist(reviews,'User-ID','ISBN',['Book-Rating'])

print(len(g))
 445839

p = bi.projected_graph(g, users)
```

Topic networkx social-network-analysis

Category Data Science


I think the problem is not with the data size, but with the presence of large degree nodes (the degree of a node is its number of neighbours).

Indeed all neighbours of a node in the bipartite graph are linked together in the projection; they form a clique (complete sub-graph) in the projection. This is terrible, as the number of links in a clique of $d$ nodes is $\frac{d\cdot(d-1)}{2} \sim d^2$.

In concrete words, a node of degree 2,000 in the bipartite graph will induce almost two million links in the projection... A node of degree 20,000 will induce almost two hundred million links...

For more discussion of the problems with projections of bipartite graphs, you may have a look for instance at this paper: Basic Notions for the Analysis of Large Affiliation Networks / Bipartite Graphs

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.