Does it make sense to repeat calculating AUC in logistic regression?

Question

Does it make sense to repeat calculating AUC in logistic regression?

DataVader

2022年3月1日 06:05

I have a question regarding logistic regression models and testing its skill. I am not quite sure if I understand correctly how the ROC Curve is established.

When calculating the ROC curve, is a train test split happening and then the skill of a model based on the training split is tested on the test split? or is a model based on the ENTIRE data just tested on the ENTIRE data?

If the first is the case, would it make sense to do repeated random train test splits and average out the Area under the curve? would that bring about any additional certainty about the model's skill?

Thank you.

Topic auc validation roc logistic-regression

Category Data Science

engelAnalytics · Accepted Answer · 2021年5月27日 14:57

The calculation of ROC curve and the AUC based off of that curve is simply a comparison of the predictions from your model (logistic regression) and the actual values on some set of data. This can occur with predictions on the training set or predictions on a test set. Best practice is to do this comparison on a test set as this will best represent the performance of the model on new data.

Averaging of the AUC based on repeated train-test splits, with retraining of the logistic regression on the training set and calculation of the ROC curve and AUC on the test set can give a better estimate of the performance of the model. In addition, the distribution of the AUC across each of the splits can give a sense of the stability of the model's performance.

The repeating of train-test splits is commonly called cross-validation and most commonly performed through k-fold cross-validation where you split the data into k sets, use one as the test set and the rest as the training set. You can then repeat this process using each of the k sets as the test set.

Does it make sense to repeat calculating AUC in logistic regression?

About