What approach should I take for my product classification ML model with user feedback for improving result accuracy?

Question

What approach should I take for my product classification ML model with user feedback for improving result accuracy?

Julian Gherghel

2021年10月7日 11:43

I'm trying to implement a product categorization ML model on a dataset with the following structure: Data sample

I want to my model to be able to predict the correct category that the product should fall under, based on product description and name.

However, I will be implementing this together with a GUI which allows some user input.

For example, a new product name with a description gets added to the table: New entry before feedback training

The user will be presented with the following options (completely made these up) and has to select one:

Kitchen furniture - 65%

Home decorations - 29%

Kitchen Appliances - 6%

User will click on 'Home decorations'. This gets fed back to the model. Next time the model encounters something similar, such as: New entry after feedback training

The user will be presented with more accurate predictions, where this time they have the same options to choose from, but with different predicted accuracy:

Home decorations - 70%

Kitchen furniture - 20%

Kitchen appliances - 10%

Therefore, the model has learned from that feedback and has become more accurate. I've done some research around this and it has pointed towards Reinforcement Learning. However, I couldn't find anything too similar and I am not THAT skilled in ML, so please point me in the right direction in terms of what Python libraries to use, what ML models to look at and maybe even previous implementations.

Thanks!

Topic real-ml-usecase reinforcement-learning nlp machine-learning

Category Data Science

German C M · Accepted Answer · 2021年10月6日 13:56

This could be framed, as a first approximation, as a supervised learning classifier, where, based on the input texts (both name and description), you can build a series of features to build your classification model.

One option is:

tokenize (split into words) your texts (both name and descriptions)
filter some not useful (presumably) words like preprositions and other so called stop words (look at libraries like nltk for language processing
select the most frequent words of your bag-of-words based on all of the categories you have until now; this is something you can find out by looking at a frequency bar plot for your entire words dataset
find the frequency of ocurrence of each word in each of your name-description sample, where each row of your dataset could be something like:

kitchen	bathroom	storage	cooking	microwave	oven	...	CATEGORY_label
0	0	0	0	2	1	...	1
1	0	1	0	0	0	...	3
...

where label 1 is your kitchen appliances category and so on...

This would end up in a multi-class classifier, since you are trying to classify among several possible categories.

As many new entries as you end having, more key words you will have for each category.

This is the easiest approach (based on words counting) you could begin with, since, for natural language processing, you can go on lagter with other approaches: TF-IDF instead of just counting words, and other more sophisticated like word embeddings

What approach should I take for my product classification ML model with user feedback for improving result accuracy?

About