Perfect scores for multiclass classification
I am working on a multiclass classification problem with 3 (1, 2, 3) classes being perfectly distributed. (70 instances of each class resulting in (210, 8) dataframe). Now my data has all the 3 classes distributed in order i.e first 70 instances are class1, next 70 instances are class 2 and last 70 instances are class 3. I know that this kind of distribution will lead to good score on train set but poor score on test set as the test set has classes that the model has not seen. So I used stratify
parameter in train_test_split
. My code:-
# SPLITTING
train_x, test_x, train_y, test_y = train_test_split(data2, y, test_size = 0.2, random_state =
69, stratify = y)
cross_val_model = cross_val_score(pipe, train_x, train_y, cv = 5,
n_jobs = -1, scoring = 'f1_macro')
s_score = cross_val_model.mean()
def objective(trial):
model__n_neighbors = trial.suggest_int('model__n_neighbors', 1, 20)
model__metric = trial.suggest_categorical('model__metric', ['euclidean', 'manhattan',
'minkowski'])
model__weights = trial.suggest_categorical('model__weights', ['uniform', 'distance'])
params = {'model__n_neighbors' : model__n_neighbors,
'model__metric' : model__metric,
'model__weights' : model__weights}
pipe.set_params(**params)
return np.mean( cross_val_score(pipe, train_x, train_y, cv = 5,
n_jobs = -1, scoring = 'f1_macro'))
knn_study = optuna.create_study(direction = 'maximize')
knn_study.optimize(objective, n_trials = 10)
knn_study.best_params
optuna_gave_score = knn_study.best_value
pipe.set_params(**knn_study.best_params)
pipe.fit(train_x, train_y)
pred = pipe.predict(test_x)
c_matrix = confusion_matrix(test_y, pred)
c_report = classification_report(test_y, pred)
Now the problem is that I am getting perfect scores on everything. The f1 macro score from performing cv is 0.898. Below are my confusion matrix and classification report:-
14 0 0
0 14 0
0 0 14
Classification Report:-
precision recall f1-score support
1 1.00 1.00 1.00 14
2 1.00 1.00 1.00 14
3 1.00 1.00 1.00 14
accuracy 1.00 42
macro avg 1.00 1.00 1.00 42
weighted avg 1.00 1.00 1.00 42
Am I overfitting or what?
Topic f1score overfitting multiclass-classification cross-validation
Category Data Science