Are there any graph embedding algorithms like this already?

Question

Are there any graph embedding algorithms like this already?

monomonedula

2022年5月13日 13:00

I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples.

The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation?

Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective.

In short, the algorithm is heavily inspired by PageRank. Every node is described by its proximity vector which contains the node's closeness number to every other node or some selected subset of the nodes. The "closeness" is not just the simple shortest distance.

Here's brief explanation from the repo's readme for directed/undirected unweighted graph (the idea generalizes to weighted graphs pretty intuitively):

Every node is assigned a vector. For node i the vector's j-th element is a number representing its closeness to node j (you can think of it as signal strength).
The closeness of the node i to the node j (the "central" node the signal flows from) is defined as a sum of the signal strengths of the neighboring nodes multiplied by the damping factor parameter. (The dumping factor works like a kind of distance penalty)
Signal a node "emits" to other connected nodes equal to the given signal's strength in the node itself divided by the number of edges it is able to emit to.
Initially, the node j is assigned a closeness (to itself) equal to 1.

The whole thing is computed by solving sparse linear systems.

"Signal" strengths from node 4 to other nodes. Note that the signal here travel in direction opposite to edge direction (like "follow" relation on Instagram)

Topic numpy representation embeddings graphs python

Category Data Science

Valentas · Accepted Answer · 2021年12月4日 10:21

I would not be so skeptical. Yes, there is lots of research on random walk based recommender/ranking systems, including those similar to PageRank. Yes, spectral methods and linear algebra are researched for such problems very well. And a sparse matrix is a natural data structure for this problem, so you'd find many implementations that exploit it.

But if this particular type of random walks has not been tried before (although it does look from your description that it might be the usual random walk that just selects an outgoing edge at random) and/or especially if it works practically to solve some real problem or improves on some benchmark, your algorithm might still be interesting. But to get any serious people to have a look at it you should analyze it, link to existing research and compare with other existing variations and present in a paper or a blog post.

A random set of references to start from, see also the papers they cite or the papers they are cited by:

Linyuan Lü and Tao Zhou, Link prediction in complex networks: A survey, Physica A: statistical mechanics and its applications 390 (2011), 1150-1170.

Chapter 14 of David Easley and Jon Kleinberg, Networks, crowds, and markets, Cambridge university press (2010).

Maurizio Ferrari Dacrema, Paolo Cremonesi and Dietmar Jannach, Are we really making much progress? A worrying analysis of recent neural recommendation approaches, In Proceedings of the 13th ACM Conference on Recommender Systems, 101-109 (2019).

Ryan · Accepted Answer · 2021年11月29日 20:36

So I think that it is important to realize that pagerank utilizes the eigenvalues of the nodes to speed up computation. The thing is that turns out to be equivalent to a random walk. Based on what you mentioned in your question it sounds like you are describing a random walk in your 4 part procedure. The fact that you are solving it using linear algebra rather than simulating the random walk is somewhat irrelevant. It doesn't seem to me that there is anything particularly unique doing this via linear algebra. In fact, it would surprise me if anyone is actually simulating the random walks vs. doing the equivalent linear algebra like you are doing, because it is much more efficient to do the linear algebra.

Are there any graph embedding algorithms like this already?

About