Are my classification results like accuracy, precision, recall etc significant and valid for general data?
So I have this data let's say of size (2000,11)
, and I want to do perform a binary classification based on these eleven features. There is a class Imbalance between the two categories so I balance the classes using Random Oversampling to ensure the classifier will be generalized. The data is splint into train and test set, then I create a pipeline with the following stuff: (StandardScalar(),SelectKBest(f_classif), SVC()). Then I use GridSearch CV on the Training with cross validation of 5-fold. Next I test the data on the set that is left out and with the preserved distribution and get the Confusion matrix and results. Everything looks great, but there is no way to find out if my results are valid on another dataset of similar nature because I do not have any other data at the moment to try on.
Now my question is, what do you do in such a situation, where you only have one set of data that you use to create and test the model, but no other e.g real life data to verify how good your ML model performs?
Are there any metrics you can use to ensure your model works before deploying?
Note that I have already made sure that there is no overfitting/underfitting to the data that is available.
Topic generalization cross-validation scikit-learn machine-learning
Category Data Science