Fraud risk propagation in large scale network

Question

Fraud risk propagation in large scale network

Naveed

2022年2月25日 00:05

What's the best approach to do some graph analytics and risk propagation in a network using python where multiple accounts are connected through a relationship and few of the accounts in the network are marked as bad accounts and the rest are unknown?

I tried using networkx but it seems to run forever. I have about 8MM edges and 40K nodes

Topic networkx graphs python

Category Data Science

Kirill Fedyanin · Accepted Answer · 2020年5月1日 19:26

As Victor proposed, you probably need the graph convolution networks. 40K nodes is borderline too much for the memory, so you could consider GraphSAGE-alike approaches, which propose to sample subgraphs around target points and then run some sort of GCN or GAT (graph attention networks) for them. You could use library like DGL or pytorch geometric for that.

Other notable approach is Deep Walk, it generates some embedding by neighborhood. As a plus, it preserves the locality in the embedding. The minus, in my experience, it's not scales so well, but you can give it a try.

Victor Ng · Accepted Answer · 2019年11月27日 19:42

You could try applying a graph convolutional network to do some semi-supervised learning. See Kipf and Welling's paper "Semi-Supervised Classification with Graph Convolutional Networks". It probably depends on how unbalanced your dataset is though. If the dataset is too large, you could train a sample of it, and train the GCN on that subset. I'd try to find some exemplar data points and create a train set from that.

Fraud risk propagation in large scale network

About