How does amazon's reviews that mention extracts topics from reviews?

Amazon product page contains a section called Reviews that mention. The section lists the main things that users liked or dislike about the product. For example see this page. How exactly does it work?

This can be done using topic modelling using LDA. But this approach has several drawback.

  • You need to choose number of topics upfront. But in amazon reviews number of topics vary for each product. Number of topics are not the same even for products that belong to same category.

  • You need to give friendly name to each topic. With so many products its unlikely that amazon does that.

What approach would be suitable to do this in completely unsupervised way, without the drawbacks mentioned above.

Topic real-ml-usecase topic-model nlp

Category Data Science


One possible approach I can see is as follows:

  • Amazon considers (until now and based on its historic data, and checked every X time) a possible number of frequent categories (i.e. labels in a classification context)
  • In the product you send, you can see the considered categories:

enter image description here

and the most frequent terms users have writen on their reviews, used as filters:

enter image description here

  • by applying some techniques like word embeddings, you can build a classifier to find which categories those terms belong to, based on some predefined category labels

enter image description here

  • new ones categories could be found with unsupervised clustering techniques

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.