does R2 diverge because of a lack of input dimensions?

I try to improve my R2 score between theoretical and real output values. On the picture you can see two cases: the blue one is an artificial case I’m completely mastering with 7 dimensions as input and 1 dimension as output; The orange curve is a real case, 7 inputs 1 output.

As you can see, the blue curve respond as expected. The more I add data, better is the prediction. BUT, with the orange case, this is the opposite. How to explain that? Is it because an important input value is missing?

My test set is composed with 200 points, nicely spaced. Can it comes from here?

Topic data-quality correlation deep-learning python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.