CART classification for imbalanced datasets with R

Hey guys i need your help for a university project. The main Task is to analyze the effects of over/under-smapling on a imbalanced Dataset. But before we can even start with that, our task sheet says, that we 1) have to find/create imbalanced Datasets and 2) fit those with a binary classification model like CART. So my auestions would be, where do i find such imbalanced datasets? And how do i fit those datasets with CART, and what does that help in regard of over/under-sampling?

Thats my whole first try.

 # CART - Datensatz laden
 setwd("C:\\Users\\..\\Dropbox\\Uni\\Präsentation\\Datensätze")
 add - "data1.csv"
 df - read.csv(add)
 head(df) # Ersten 6 Zeilen
 nrow(df) # Anzahl der Reihen des Datensatzes

 # CART - Wichtige Daten selektieren
 df - mutate(df, x= as.numeric(x), y= as.numeric(y), label=factor(label))
 set.seed(123)
 sample = sample.split(df$x, SplitRatio = 0.70)
 train = subset(df, sample==TRUE)
 test = subset(df, sample==FALSE)

 # grow tree (Baum wachsen lassen)
 fit - rpart(x~., data = train, method = "class")
 printcp(fit)
 plotcp(fit)
 summary(fit)

 # plot tree
 plot(fit, uniform = TRUE, main="Bla Bla Bla")
 # text(fit, use.n=TRUE, all=TRUE, cex=.08)

 # prune the table -- to avoid overfitting the data#
 pfit- prune(fit, cp=   fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])
 plot(pfit, uniform=TRUE,
 main="Pruned Classification Tree for Us")

Why do i need to make such a decision tree and how does it help with Over/Under-Sampling?

Help is much appreciated

Topic cart imbalanced-learn r

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.