Classification when the classification of the previous itens matter

I have a classification problem to solve, that seems to be common but I am struggling to find the name of this task and the best way to model this problem.

Suppose I have a series of events that are sequential in time.

2 Jan - I matched with a nice girl on Tinder - ACTION_TYPE = SOCIAL_EVENT
5 Jan - I meet with her, it was nice - ACTION_TYPE = SOCIAL_EVENT
8 Jan - I just got accept to a new job. I will meet my boss tomorrow- ACTION_TYPE = PROFESSIONAL_EVENT
10 Jan - I meet with her, it was nice - ACTION_TYPE = PROFESSIONAL_EVENT

It is supervised learning, where I have correctly tagged timelines to train. But during prediction, I have to tag every single event.

I started with a text classification for the text, but I can not distinguish between the events on 5 Jan and 10 Jan.

My instinct is to combine this problem with a sequence tagging, with a CRF layer at the end. But it would be nice if you could look at other possible solutions in the literature.

How would I model this problem? Is this problem known in the literature, and if so, how can I find it?

Topic text-classification machine-learning-model sequence classification machine-learning

Category Data Science


Since your text data is sequential in nature best is to opt for sequential classification, below papers and tutorial for more information.


In time series you use data from the past to predict the future. So here your text at time t is the one you need to classify. But your data can have lagged data as input or even some aggregation function based on the past N points. In time series we use average and std over a moving window.

For example like including lags: Input data for your model:

[curent text, previous text, text before previous text]

Converted using bag of words:

[word 1 in text 1, ... ,word n in text 1, word 1 in text 2, ... , word n in text 2 ...] 

Here position matters, but neural networks can identify it.


You can frame the problem as classification. The features are text and the day-months. The target is one of the discrete category labels (i.e., SOCIAL_EVENT or PROFESSIONAL_EVENT).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.