Drawing validation set from test set
I am building a 3 neural network models on dataset that is already separated to train and test sets. From my analysis, I found that this dataset has values on test set which don't exist in the train set. And this gives a certain limitation or maximum capacity to my neural network model(s). By this I mean, I can not seem to improve the accuracy even if I change the hyper parameters or the parameters of my models.
I have created 3 neural networks models and varied almost everything:
- Number nodes/hidden layers,
- Input features (performed feature selection and space reduction),
- Activation functions and loss functions,
- regularization, optimizer and more,
When I try to average the predictions of the 3 models, I don't see any improvements. Although I've read a lot that if I change such parameters I might have some uncorrelated models. But this wasn't the case for me because I always find correlation between my model predictions when I compute Pearson Correlation
After building all these models, I am pretty sure that the training set and test set are not drawn from the same distribution (i.e. they are not a random split of some full original dataset), which means that other features probably also have a different distribution.
Some proposed I could merge the training+test, but I don't want to do that as this dataset was developed in this way. But I would like to draw my validation set from the test set, is this possible? Can I use a validation set randomly drawn from the test set to tune models?
Topic ensemble-modeling correlation cross-validation neural-network python
Category Data Science