What is the appropriate machine learning algorithm for this problem?

I have a dataset in which each sample contains a user id, a date, and the status associated with that particular user (active, expired deactivated). The dataset contains records for a full year, but there are several days missing, and the goal of the project is to predict the status of a user on these missing days. Below is an example of one record:

{
    date: 20190101,
    user: 301727881,
    status: active
},

Because the value of the variable I want to predict is categorical and there are 3 possible values, my research led me to Multinomial Logistic Regression. I would like to solve this problem using Python, and I want to know whether I'm on the right track or not. Also any links to helpful material would be highly appreciated.

Topic machine-learning

Category Data Science


Since you are trying to predict 3 classes (active, expired and deactivated), you have a multiclass classification problem on your hand. You should go for algorithms that can handle multiclass classification like Decision Trees, Random Forest, Gradient Boosting algorithms, Neural Nets etc. A simple Logistic Regression cannot handle multiclass classification (though there are ways around it), so I would advise not to go for it.

You can start with a simple DecisionTreeClassifier and see what are the results. Then you can gradually work your way up to Random Forests, Gradient Boosting, Neural Nets etc.

Here is link to an article which gives examples of Binary classification, Multi class classification and multi label classification.

Cheers!


It depends on the data you are having and the way you are representing it.

Keep in mind that all of the algorithms come with some assumptions about the data.

For small-sized datasets, KNN Classifier is often the first choice

Read this: Link. Just check the assumptions.

It all depends on how you can represent your data and your problem. An algorithm just knows what it is fed. So you need to be absolutely clear with what you do you have to feed it.

Since you have begun with LogisticRegression, kindly look closely at what assumptions it has, pros and cons. And then see if you can engineer your data accordingly or do you have ready-to-go data. Perfection is not necessary. The closer you will get, the better and more reliable your results will be.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.