Combine multiple vector fields for approximate nearest neighbor search

I have multiple vector fields in one collection. My use-case is to find similar sentences in similar contexts. The sentences and contexts are encoded to float vectors. Therefore, I have one vector for the sentence and another vector for the context (surrounding text). I would like take both vectors in consideration to find similar sentences. Unfortunately, most approximate nearest neighbor (ann) search libraries only support to search for one field. I have tried to use PostgreSQL with the cube extension to filter by multiple vector similarities. Unfortunately, the number of vectors (100M) are too high for PostgreSQL.

Questions:

  1. Is there a possibility to combine multiple vector fields for approximate nearest neighbor search?
  2. Is it also possible to weight the relevance of each vector field for the search?

Topic ann vector-space-models nlp

Category Data Science


One alternative is to re-encode the sentences and context together into the same vector space. This can be done with doc2vec or StarSpace.

If the sentences and contexts are in the same vector space, any approximate nearest neighbor (ann) search libraries could work.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.