Are there any graph embedding algorithms like this already?
I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples.
The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation?
Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective.
In short, the algorithm is heavily inspired by PageRank. Every node is described by its proximity vector which contains the node's closeness number to every other node or some selected subset of the nodes. The "closeness" is not just the simple shortest distance.
Here's brief explanation from the repo's readme for directed/undirected unweighted graph (the idea generalizes to weighted graphs pretty intuitively):
- Every node is assigned a vector. For node i the vector's j-th element is a number representing its closeness to node j (you can think of it as signal strength).
- The closeness of the node i to the node j (the "central" node the signal flows from) is defined as a sum of the signal strengths of the neighboring nodes multiplied by the damping factor parameter. (The dumping factor works like a kind of distance penalty)
- Signal a node "emits" to other connected nodes equal to the given signal's strength in the node itself divided by the number of edges it is able to emit to.
- Initially, the node j is assigned a closeness (to itself) equal to 1.
The whole thing is computed by solving sparse linear systems.
"Signal" strengths from node 4 to other nodes. Note that the signal here travel in direction opposite to edge direction (like "follow" relation on Instagram)
Topic numpy representation embeddings graphs python
Category Data Science