Reduce serving time complexity for real-time recommender systems

I am working on a real-time recommender system predicting a product to a user using deep learning techniques (like wide deep learning, deep cross-network etc). Product catalogue can be huge (1000s to 1 million) and for a given user, the model needs to be evaluated against each product in real-time. As scalability is an important concern, is there any way to reduce the serving time complexity by tuning model architecture?

Topic time-complexity deep-learning recommender-system machine-learning

Category Data Science


You write ... the model needs to be evaluated against each product in real-time., which gets me thinking that you use a binary classification (sigmoid in the final layer) architecture with negative sampling for the user/item interactions when training your model.

Have you considered using multi-class classification instead? Thus, for the user only predict once for the entire product catalogue, and selecting the top-k candidates from the softmax-layer. This way, you only need to feed-forward once through your neural net during inference.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.