how to prepare data for cross validation in mnist dataset?

How to use k-fold cross validation for MNIST dataset? I read article documentation on sci-kit learn ,in that example they used the whole iris dataset for cross validation.

from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)
scores                                              

for example while importing mnist dataset in keras

from keras.datasets import mnist
(Xtrain,Ytrain),(Xtest,Ytest)=mnist_load()

in this dataset is already divided in test and train , so to apply cross validation on the entire dataset do we need to make Xtrain and Xtest as one entity to exploit the whole data.

Topic mnist keras cross-validation scikit-learn

Category Data Science


You can either validate your results on the test set or if you want to use KFold then you could first concatenate the train and test set first and then use KFold splitting to evaluate your results. Hope it helps!


For the MNIST data,what you need to do is , apply cross validation on your training data for checking the performance of your model. Then, If you are satisfied by the performance of the model, you can train it on the whole training set. After that, you will use the trained model to make predictions for the test dataset.


from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)
scores

They are not using the whole data for cross validation as such ( it's just an illusion)

When the cv argument is an integer, cross_val_score uses the KFold or StratifiedKFold strategies by default, the latter being used if the estimator derives from ClassifierMixin..

So it's kind of automated inside the call..

Check this kaggle kernel link

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.