Combine multiple vector fields for approximate nearest neighbor search

Question

Combine multiple vector fields for approximate nearest neighbor search

roemchine

2022年2月4日 19:44

I have multiple vector fields in one collection. My use-case is to find similar sentences in similar contexts. The sentences and contexts are encoded to float vectors. Therefore, I have one vector for the sentence and another vector for the context (surrounding text). I would like take both vectors in consideration to find similar sentences. Unfortunately, most approximate nearest neighbor (ann) search libraries only support to search for one field. I have tried to use PostgreSQL with the cube extension to filter by multiple vector similarities. Unfortunately, the number of vectors (100M) are too high for PostgreSQL.

Questions:

Is there a possibility to combine multiple vector fields for approximate nearest neighbor search?
Is it also possible to weight the relevance of each vector field for the search?

Topic ann vector-space-models nlp

Category Data Science

Brian Spiering · Accepted Answer · 2022年2月4日 19:44

One alternative is to re-encode the sentences and context together into the same vector space. This can be done with doc2vec or StarSpace.

If the sentences and contexts are in the same vector space, any approximate nearest neighbor (ann) search libraries could work.

Combine multiple vector fields for approximate nearest neighbor search

About