Fashion MNIST: Is there an easy way to extract only 1% of the data to do a minimal gridsearch?
I am trying implement several models on the fashion-MNIST. I have imported the data according to the tf.keras tutorial:
import tensorflow as tf
from tensorflow import keras
import sklearn
import numpy as np
f_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = f_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(60000, 28, 28)
(10000, 28, 28)
# Need to concatenate as GridsearchCV takes entire set in input
all_images = np.concatenate((train_images, test_images))
all_labels = np.concatenate((train_labels, test_labels))
(70000, 28, 28)
The 10 labels are equally distributed in both training and testing set:
Since this is only to practice I would like to implement a minimal grid search but instead of using the entire set of 70 000 samples I'd like to extract only say 1% to do a grid search on that.
That way I can learn how it works without spending to much time on the computation.
The tutorials I see use however only the from skelearn.model_selection import GridSearchCV
module which takes the entire set as input:
# Splitting the entire set into train and test
X_train, X_test, y_train, y_test = train_test_split(all_images,all_labels,
test_size=0.3, random_state = 101)
parameters_grid={'C':[0.001, 0.01, 0.1, 1, 10], 'gamma': [1, 0.1,
0.01, 0.001, 0.0001],
'kernel': ['rbf']}
grid=GridSearchCV(SVC(),parameters_grid, refit = True, verbose = 3) )
So far the only work around I could think of is to use only the set of test_images as it is smaller. But I guess it would still run for a while, given that it contains 10 000 images...
I also thought about changing the function to use a just a smaller portion for training like so:
# Splitting the entire set into train and test
X_train, X_test, y_train, y_test = train_test_split(test_images, test_labels, test_size=0.99, random_state = 101)
That way I'd use only the test_images that hold only 10 000 samples. I think this would lead to the models being trained on only 1% of the 10 000 and the rest will just be used for testing.
Is there an better python-way to extract only 1% of all_images
or test_images
with the corresponding all_labels
or test_labels
Obviously I would build the final model feeding all 60 000 training samples and subsequently test it on the 10 000.
I googled and talked to colleagues but no hits or answers.
Topic grid-search keras scikit-learn machine-learning
Category Data Science