Query regarding surprising spike in accuracy of ML model

I implemented all the major ML models (Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree, Random Forest, Ada Boost XGBoost) on my dataset. My stratified cross-validation scores are between 70% 80%. When I implemented my models using grid search, my accuracies shot up they lie between 90% 95%. Is this drastic increase in accuracy abnormal fishy?

My GridSearch CV code for Logistic Regression--

from sklearn.datasets import make_blobs, make_classification
from sklearn.model_selection import GridSearchCV
scaled_inputs, targets = make_classification(n_samples=1000, n_classes=2, random_state=43)  
#n_samples=no.of test records considered in each fold 
x_train, x_test, y_train, y_test = train_test_split(scaled_inputs, targets, test_size=0.25, random_state=43)

parameter_grid = {'C':[0.001,0.01,0.1,1,10],   
                  'penalty':['l1', 'l2']  
                  }

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=43)
estimator = GridSearchCV(estimator=lr, param_grid=parameter_grid, \
scoring='accuracy', cv=10, n_jobs=-1)


estimator.fit(x_train, y_train)

print(estimator.best_params_)
print(estimator.best_estimator_)
print(estimator.best_score_)

**Output - {'C': 0.1, 'penalty': 'l2'}
LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=43, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)
0.9279999999999999**

best_penalty = estimator.best_params_['penalty']
best_C = estimator.best_params_['C']

clf_lr = LogisticRegression(penalty=best_penalty, C=best_C)
clf_lr.fit(x_train, y_train)

predictions = clf_lr.predict(x_test)
from sklearn.metrics import accuracy_score
print(f'Accuracy',accuracy_score(y_test, predictions))

**Output --Accuracy 0.932**

Topic grid-search gridsearchcv cross-validation accuracy

Category Data Science


This lies in the definition of Grid Search.Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’ predictions.I dont think there ia any kind of abnormality associated with the final prediction .

However Accuracy is not the only metrics to evaluate our Classification models. Use Confusion Matrix to evaluate your model.

from sklearn.metrics import confusion_matrix
print('Confusion Matrix : \n' + str(confusion_matrix(y_test,y_pred)))

After running the above code import classification report It will give you detailed report and you can verfify if there is actually something fishy or not .

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.