Scalable way to calculate betweenness centrality for a graph in spark

Question

Scalable way to calculate betweenness centrality for a graph in spark

abhati

2022年2月12日 23:01

I have a use-case to calculate betweenness centrality of nodes. I have tried graphx with spark-betweenness but it is a very long running job. Has anyone successfully calculated betweenness centrality of a large network with around 10 million vertices and 100 million edges?

Topic networkx graphs apache-spark social-network-analysis machine-learning

Category Data Science

Matthieu Latapy · Accepted Answer · 2020年12月27日 23:22

Sorry, I do not think you can compute the exact betweenness centrality of nodes in a graph this size, as its complexity is $O(n\cdot m)$ where $n$ is the number of nodes, $m$ the number of links.

The good news is that you may approximate it, and in a way that may take benefit from parallel computations. Indeed, computing betweenness centrality relies on counting the number of shortest paths from any node to any other. You may (randomly) select some nodes and compute the numbers of shortest path from each of them to all others, and use the obtained number to approximate the betweenness. The more nodes you select, the best the approximate will be, but it is empirically rather good even with a small sample set.

Scalable way to calculate betweenness centrality for a graph in spark

About