How to train with cross validation? and which f1 score to choose?
I got similar results in 2 models which consists of similar algorithms.
Model 1 with cv=10 has a f1'micro' of 0.941. See code below. Model 2 only train test split (no cv) has f1'micro' 0.953.
Now here is my understanding problem. Before I did a Grid-Search to find best hyperparameters. Now I would like to do just a cross validation to train the dataset. Like the red marked in the picture. In the code there is still the Grid Search inside.
Question 1: Is this code doing that what I want? (is that a cross validation to train the dataset?)
Question 2: when I have 2 models like in the picture, model 1 with cross validation (red marked) and model 2 with train validation test data - what are the reasons to choose model 1 with cross validation and why?
X = df.drop('columnA', axis=1)
y = df.columnA
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.20, random_state=42)
xgb_params = {
'max_depth': [6],
'learning_rate': [0.3]}
xgb_clf = GridSearchCV(XGBClassifier(random_state=42),
xgb_params,
cv=10,
n_jobs=-1,
verbose=True)
xgb_clf.fit(X_train, y_train)
xgb_pred = xgb_clf.predict(X_test)
print(accuracy_score(xgb_pred, y_test))
print(f1_score(xgb_pred, y_test, average='micro'))
Im sorry if my point of view is strange, but I have a lack of knowledge and Im confused about Cross Validation and Kfold and how to use it.
Topic ensemble-learning ensemble ensemble-modeling cross-validation
Category Data Science