How to train with cross validation? and which f1 score to choose?

I got similar results in 2 models which consists of similar algorithms.

Model 1 with cv=10 has a f1'micro' of 0.941. See code below. Model 2 only train test split (no cv) has f1'micro' 0.953.

Now here is my understanding problem. Before I did a Grid-Search to find best hyperparameters. Now I would like to do just a cross validation to train the dataset. Like the red marked in the picture. In the code there is still the Grid Search inside.

Question 1: Is this code doing that what I want? (is that a cross validation to train the dataset?)

Question 2: when I have 2 models like in the picture, model 1 with cross validation (red marked) and model 2 with train validation test data - what are the reasons to choose model 1 with cross validation and why?

X = df.drop('columnA', axis=1)
y = df.columnA

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.20, random_state=42)

xgb_params = {
    'max_depth': [6],
    'learning_rate': [0.3]}

xgb_clf = GridSearchCV(XGBClassifier(random_state=42),
                      xgb_params,
                      cv=10,
                      n_jobs=-1,
                      verbose=True)

xgb_clf.fit(X_train, y_train)


xgb_pred = xgb_clf.predict(X_test)
print(accuracy_score(xgb_pred, y_test))
print(f1_score(xgb_pred, y_test, average='micro'))

Im sorry if my point of view is strange, but I have a lack of knowledge and Im confused about Cross Validation and Kfold and how to use it.

Topic ensemble-learning ensemble ensemble-modeling cross-validation

Category Data Science


I think you confused some technical names. Cross-Validation is the name of the procedure, and it has some techniques or approaches such as k-fold cross-validation, train test split, etc. All are techniques to measure the performance of a model.

In your case, you have the first model that is assessed using 10-fold cross-validation and has an f1-score of 0.941, and the second model is assessed using the train test split approach and has an f1-score of 0.953. In this case, choosing the better model depends on what you want to give the privilege. In other words, whether you focus more on False Predictions or True Predictions, or False Negatives of False Positives. For this purpose, check the confusion matrix below.

enter image description here

There are several scenarios which mainly are:

  1. Using Recall metric

You can use it if False Negatives are more important for you. In the medical analysis, for example, False Negatives are tried to be minimized and scientist measures model's performance essentially by using Recall. Because, for example, it is a more acceptable case to predict a person cancer but actually that person is healthy while compared to the case where a person is labeled as healthy but in actual that person has cancer.

  1. Using Precision metric

It is preferred when False Positives are more important than False Negatives.

  1. Using F1-score

It helps to identify the state of incorrectly classified samples. In other words, False Negative and False Positives are attached more importance.

  1. Using Accuracy score

It is mostly used when True Positive and True Negatives are prioritized.

So back to your question, you should not choose the model that best 'fits' (or performs) in a test, but you should choose a model that fits your demands most and the test must be chosen depending on the demands. If you want to have balanced False Negatives and False positives use the f-score to choose a model, which is the model2 in the above-mentioned case. However, if you focus on a different target, make your choice according to a performance metric.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.