Query regarding surprising spike in accuracy of ML model
I implemented all the major ML models (Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree, Random Forest, Ada Boost XGBoost) on my dataset. My stratified cross-validation scores are between 70% 80%. When I implemented my models using grid search, my accuracies shot up they lie between 90% 95%. Is this drastic increase in accuracy abnormal fishy?
My GridSearch CV code for Logistic Regression--
from sklearn.datasets import make_blobs, make_classification
from sklearn.model_selection import GridSearchCV
scaled_inputs, targets = make_classification(n_samples=1000, n_classes=2, random_state=43)
#n_samples=no.of test records considered in each fold
x_train, x_test, y_train, y_test = train_test_split(scaled_inputs, targets, test_size=0.25, random_state=43)
parameter_grid = {'C':[0.001,0.01,0.1,1,10],
'penalty':['l1', 'l2']
}
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=43)
estimator = GridSearchCV(estimator=lr, param_grid=parameter_grid, \
scoring='accuracy', cv=10, n_jobs=-1)
estimator.fit(x_train, y_train)
print(estimator.best_params_)
print(estimator.best_estimator_)
print(estimator.best_score_)
**Output - {'C': 0.1, 'penalty': 'l2'}
LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None, penalty='l2',
random_state=43, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False)
0.9279999999999999**
best_penalty = estimator.best_params_['penalty']
best_C = estimator.best_params_['C']
clf_lr = LogisticRegression(penalty=best_penalty, C=best_C)
clf_lr.fit(x_train, y_train)
predictions = clf_lr.predict(x_test)
from sklearn.metrics import accuracy_score
print(f'Accuracy',accuracy_score(y_test, predictions))
**Output --Accuracy 0.932**
Topic grid-search gridsearchcv cross-validation accuracy
Category Data Science