How is a textual search engine able to recognize subwords from words?

Question

How is a textual search engine able to recognize subwords from words?

Darrel

2021年6月11日 06:45

I am interested to know how information retrieval systems are able to consider relevant subwords from a main search word when performing a keyword search. For example, the word wristband can either be considered as is, or as wrist band. When word tokenized, they appear as [wristband] and [wrist, band] respectively. If I am querying with wristband, the wrist and band will be ignored in the count vector.

Yet, I find common search engines that are able to retrieve results that contain subwords from the main search word:

Search suggestions for wristband from Shopee, an e-commerce site.

Search suggestions for wristband from Amazon, an e-commerce site.

How is this done? I'm guessing they could brute force every single character combination and find ones with the most hits, but that doesn't sound very efficient.

I'm just looking for someone to point me in the right direction so that I can do more research myself.

Topic nlp information-retrieval

Category Data Science

How is a textual search engine able to recognize subwords from words?

About