Stratified K Fold Cross Validation in Orange: python script

Question

Stratified K Fold Cross Validation in Orange: python script

Emma Bartholomeeusen

2020年12月15日 14:27

I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.

As my problem is unbalanced (10% churn - 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test score block).

Therefore, I want to, based on my input data, generate first 10 folds (stratified - where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 - 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?

Thank you! Emma

Topic imbalance orange cross-validation python

Category Data Science

Stratified K Fold Cross Validation in Orange: python script

About