Why my regression model always be dominanted by one feature?

I am working on a financial predict problem. which means it is a time series prediction problem.

I have three features, which have high correlation(each two's corr is about 0.6) And I do the linear regression fit.

I assume that the coefficient should be similiar among these three features, but i get a coefficient vector like this:

[0.01, 0.15, 0.01]

which means the second features have the biggest coff(features are normalized), and it can dominant the prediction result.

I dont know why. I think adding weak features can boost my prediction model, but i think the second feature is dominant in my model, and other features are worthless.

Why one of features can be dominant in the model, did I miss something?

Topic normalization regression feature-selection machine-learning

Category Data Science


The first thing that comes to my mind is that you might have not normalized your features correctly. Generally a feature ranging between a bigger range of values, compared to the other ones, is going to be more influencial in terms of the models' output.

In order to midigate this issue, one common practise is to transform your features into having zero-mean and a variance of one. In this way, it is guaranteed that all your features have identical range of values.

Other than that, It may be just that the dominant feature is indeed more indicative for your time series prediction and thus your model has learned to rely its predictions on this specific feature.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.