Can I perform a Logistic regression on this data?

I have the data below: I want to explain the relationship between 'Milieu' who has two factors, and 'DAM'. As you may notice, the blue population's included in the red population. Can I apply a logistic regression?

Topic logistic-regression

Category Data Science


Yes. If you have numeric features for a classification problem, you can apply logistic regression.

However, you are unlikely to see spectacular results for this data. Let's look at the classic example Iris data set that does perform well under logistic regression:

Iris dataset separability

This data set works well because the classes are largely linearly separable. Essentially, you could draw lines on that graph to separate the classes. Logistic regression is able to learn this and correctly classify most samples.

In the case of your data, logistic regression (and all other methods of classification) will struggle in the "overlapping" region because the features you have available simply don't provide enough information to correctly identify classes in the this region. You should still see some success outside of this region.

The best way to answer whether or not logistic regression will meet your needs is to run an experiment. Run it on your training data while holding back a test set and check performance on the held out data. If this gives performance that meets your needs, you're good to go. Otherwise, you'll likely need to explore additional features or come up with another way to solve this problem.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.