Feature engineering: The more features I add the better RMSE I get?

I have a model with 7 features, I'm trying to figure out if I can improve the performance of this model by adding additional features. So I'm relying on the RMSE to measure the accuracy of my predictions. from 7 features I get to 25 features and with each time I add a new feature, the RMSE slightly gradually get better (smaller). I find it hard to believe that all of these features improved the performance of my model as some of them have very low correlation with the target.

My question is I guess: Can I rely on the RMSE in this case to select/add features to my model?

Topic rmse feature-engineering feature-selection predictive-modeling machine-learning

Category Data Science


The number of features can be used to handle two situations:

High bias (the common one): Adding features is one way to approach models with high bias because additional features can increase the predictive power of your data. This is commonly done as part of feature engineering.

High variance (the uncommon one): However, a large number of features can also cause overfitting and in cases with high variance reducing the number of features may reduce the extend your models overfits. This is not very common because usually eliminating features is done to get rid of irrelevant or redundant features but usually the goal is not to reduce variance (rather other techniques, such as reducing model complexity through regularization, are applied).

Therefore, it is not totally unexpected that your model performance improves with more features but it is important to check if this performance increase generalizes too (e.g. by applying cross validation) and you do not just overfit.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.