CART classification for imbalanced datasets with R
Hey guys i need your help for a university project. The main Task is to analyze the effects of over/under-smapling on a imbalanced Dataset. But before we can even start with that, our task sheet says, that we 1) have to find/create imbalanced Datasets and 2) fit those with a binary classification model like CART. So my auestions would be, where do i find such imbalanced datasets? And how do i fit those datasets with CART, and what does that help in regard of over/under-sampling?
Thats my whole first try.
# CART - Datensatz laden
setwd("C:\\Users\\..\\Dropbox\\Uni\\Präsentation\\Datensätze")
add - "data1.csv"
df - read.csv(add)
head(df) # Ersten 6 Zeilen
nrow(df) # Anzahl der Reihen des Datensatzes
# CART - Wichtige Daten selektieren
df - mutate(df, x= as.numeric(x), y= as.numeric(y), label=factor(label))
set.seed(123)
sample = sample.split(df$x, SplitRatio = 0.70)
train = subset(df, sample==TRUE)
test = subset(df, sample==FALSE)
# grow tree (Baum wachsen lassen)
fit - rpart(x~., data = train, method = "class")
printcp(fit)
plotcp(fit)
summary(fit)
# plot tree
plot(fit, uniform = TRUE, main="Bla Bla Bla")
# text(fit, use.n=TRUE, all=TRUE, cex=.08)
# prune the table -- to avoid overfitting the data#
pfit- prune(fit, cp= fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])
plot(pfit, uniform=TRUE,
main="Pruned Classification Tree for Us")
Why do i need to make such a decision tree and how does it help with Over/Under-Sampling?
Help is much appreciated
Topic cart imbalanced-learn r
Category Data Science