Creating training data
My task is to classify free text originated from customer complaints about our product.
I have created a Taxonomy and have around 10 different categories. I've realized that these categories include keywords.
Example:
"Customer doesn't understand how to use the product".
Keywords: understand, knowledge, know, aware.
Record:
Training, Customer doesn't understand how to use the product
I'm using Google Prediction API. When training the model, I would categorize previous text as: "Customer doesn't understand how to use the product" - Training.
How can I add keywords to free text/training data to help the model perform better and provide a better confidence level?
Data in training set:
Training, understand knowledge know aware
Training, Customer doesn't understand how to use the product
Right now, I'm adding Keywords into same training data, but looking for a better suggestion.
Topic google-prediction-api nltk nlp data-cleaning
Category Data Science