Does it make sense to repeat calculating AUC in logistic regression?

I have a question regarding logistic regression models and testing its skill. I am not quite sure if I understand correctly how the ROC Curve is established.

When calculating the ROC curve, is a train test split happening and then the skill of a model based on the training split is tested on the test split? or is a model based on the ENTIRE data just tested on the ENTIRE data?

If the first is the case, would it make sense to do repeated random train test splits and average out the Area under the curve? would that bring about any additional certainty about the model's skill?

Thank you.

Topic auc validation roc logistic-regression

Category Data Science


The calculation of ROC curve and the AUC based off of that curve is simply a comparison of the predictions from your model (logistic regression) and the actual values on some set of data. This can occur with predictions on the training set or predictions on a test set. Best practice is to do this comparison on a test set as this will best represent the performance of the model on new data.

Averaging of the AUC based on repeated train-test splits, with retraining of the logistic regression on the training set and calculation of the ROC curve and AUC on the test set can give a better estimate of the performance of the model. In addition, the distribution of the AUC across each of the splits can give a sense of the stability of the model's performance.

The repeating of train-test splits is commonly called cross-validation and most commonly performed through k-fold cross-validation where you split the data into k sets, use one as the test set and the rest as the training set. You can then repeat this process using each of the k sets as the test set.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.