How is a textual search engine able to recognize subwords from words?
I am interested to know how information retrieval systems are able to consider relevant subwords from a main search word when performing a keyword search. For example, the word wristband
can either be considered as is, or as wrist band
. When word tokenized, they appear as [wristband]
and [wrist, band]
respectively. If I am querying with wristband
, the wrist
and band
will be ignored in the count vector.
Yet, I find common search engines that are able to retrieve results that contain subwords from the main search word:
Search suggestions for wristband
from Shopee, an e-commerce site.
Search suggestions for wristband
from Amazon, an e-commerce site.
How is this done? I'm guessing they could brute force every single character combination and find ones with the most hits, but that doesn't sound very efficient.
I'm just looking for someone to point me in the right direction so that I can do more research myself.
Topic nlp information-retrieval
Category Data Science