Text to Text classification
I am new comer to the field of data science and have been struggling with a simple classification problem. It seems to be generic enough and I have a suspicion that there must be a better way to frame/model this problem. I would appreciate any help.
Background
- In our system, we have millions of tickets (similar to JIRA tickets) where each ticket has attributes like
title
,description
,tags
etc. - A user can create a dashboard and add any number of these tickets to their dashboards. Each dashboard has a
title
anddescription
. - Currently there are ~100k tickets in ~3k dashboards.
Problem Statement
- Given a new ticket, I want to suggest which dashboards can it be added to.
- Given a new dashboard, I want to suggest which tickets can be added to it.
My Attempts
In my first attempt, I tried to use a Multi-Class Text Classification with Doc2Vec Logistic Regression.
- Basically, I created vectors from ticket titles (using Doc2Vec) and then ran a logistic regression with these vectors as input and dashboard titles as labels.
- However following this approach I only got to 2-3% accuracy.
- I think that is because logistic regression with ~3k labels is not a good choice.
In my second attempt, I created 2 vectors (using Doc2Vec) for ticket title and dashboard title and trained a neural network with ticket title vector as input and dashboard title vector as output.
- As before, I only achieved 2% accuracy with this approach.
Question
- I would like to know from experts, if I am on the right track here with these approaches? If so, should I continue tweaking my model to improve the accuracy?
- Or am I on a completely incorrect track? If so, are there better approaches to model such a classification problem? I am a bit lost and would appreciate any pointers.