What are graph embedding?

I recently came across graph embedding such as DeepWalk and LINE. However, I still do not have a clear idea as what is meant by graph embeddings and when to use it (applications)? Any suggestions are welcome!

Topic graphs

Category Data Science


Graph embedding learns a mapping from a network to a vector space, while preserving relevant network properties.

Vector spaces are more amenable to data science than graphs. Graphs contain edges and nodes, those network relationships can only use a specific subset of mathematics, statistics, and machine learning. Vector spaces have a richer toolset from those domains. Additionally, vector operations are often simpler and faster than the equivalent graph operations.

One example is finding nearest neighbors. You can perform "hops" from node to another node in a graph. In many real-world graphs after a couple of hops, there is little meaningful information (e.g., recommendations from friends of friends of friends). However, in vector spaces, you can use distance metrics to get quantitative results (e.g., Euclidian distance or Cosine Similarity). If you have quantitative distance metrics in a meaningful vector space, finding nearest neighbors is straightforward.

"Graph Embedding Techniques, Applications, and Performance: A Survey" is an overview article that goes into greater detail.


In the paper A central limit theorem for an omnibus embedding of random dot product graphs by Levin et.al. paper, a specific type of graph embedding (the Omnibus embedding) defines graph embedding as a methodology "in which the vertices of a graph are mapped to vectors in a low-dimensional Euclidean space." Check the link for more information.


What are graph Embeddings ? "Graph Embeddings" is a hot area today in machine learning. It basically means finding "latent vector representation" of graphs which captures the topology (in very basic sense) of the graph. We can make this "vector representation" rich by also considering the vertex-vertex relationships, edge-information etc. There are roughly two levels of embeddings in the graph (of-course we can anytime define more levels by logically dividing the whole graph into subgraphs of various sizes):

  • Vertex Embeddings - Here you find latent vector representation of every vertex in the given graph. You can then compare the different vertices by plotting these vectors in the space and interestingly "similar" vertices are plotted closer to each other than the ones which are dissimilar or less related. This is the same work that is done in "DeepWalk" by Perozzi.
  • Graph Embeddings - Here you find the latent vector representation of the whole graph itself. For example, you have a group of chemical compounds for which you want to check which compounds are similar to each other, how many type of compounds are there in the group (clusters) etc. You can use these vectors and plot them in space and find all the above information. This is the work that is done in "Deep Graph Kernels" by Yanardag.

Applications - By looking carefully, embeddings are "latent" representations which means if a graph has a |V| * |V| adjacency matrix where |V| = 1M, its hard to use or process a 1M * 1M numbers in an algorithm. So, latent embedding of dimension 'd', where d << |V|, would make the adjacency matrix |V| * d and relatively easier to use. Another application could be - Consider a simple scenario where we want to recommend products to the people who have similar interests in a social network. By getting vertex embeddings (here it means vector representation of each person), we can find the similar ones by plotting these vectors and this makes recommendation easy. These are some applications and there are others. You can refer to a nice survey paper - Graph Embedding Techniques, a Survey.

Where from it all came ? There has been a lot of works in this area and almost all comes from the groundbreaking research in natural language processing field - "Word2Vec" by Mikolov. If you want to get started with the research on graph embeddings, I would recommend to first understand how Word2Vec works. You can find nice explanations - Word2Vec parameter learning explained and Stanford Lecture. Then you can jump to the papers that you listed. Those works can be categorized as:

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.