Difficulties in create a confusion matrix in R for Yes or No

I am new to regression and confusion matrix and trying to create a confusion matrix from logistic binary regression model. I am trying to create a confusion matrix from Yes or No values from the column, Survived. I am using a default dataset, Titanic. I received an error when trying to perform Confusion Matrix

The dataset, Titanic content is found here. Titanic Content

Here is the R code below.

example$Class- as.factor(example$Class)
example$Sex- as.factor(example$Sex)
example$Age- as.factor(example$Age)
example$Survived- as.factor(example$Survived)

trainRowNum - createDataPartition(example$Survived, #The outcome variable
#proportion of example to form the training set
p=0.3,
#Don't store the result in a list
list=FALSE);
# Step 2: Create the training mydataset
trainData - example[trainRowNum,] 
# Step 3: Create the test mydataset
testData - example[-trainRowNum,]

mod.surv.lg - glm(Survived~., family=binomial(), data=trainData);
#Provide a summary of the model
summary(mod.surv.lg)

p - predict(mod.surv.lg, testData,type=response)
p_class - ifelse(p  0.5,Yes,No)
table(p_class)
p_class
table(p_class, testData[[Survived]])
confusionMatrix(p_class, testData$Survived);

I received an error when performing confusionMatrix function

[1] No  Yes
0 rows (or 0-length row.names)
Warning message:
In Ops.factor(predictedScores, threshold) : ‘’ not meaningful for factors

Topic rstudio regression logistic-regression confusion-matrix r

Category Data Science


You didn't provide your dataset so I used the library titanic. I tried to make the data similar to the one you have, but there might be differences (btw this is why you should provide a reproducible code, preferably).


# Added to make this a reproducible code
library(caret)
library(titanic)

# select columns used by OP
# caution: different column name than OP for "Class"
example <- titanic_train[,c('Pclass','Sex', 'Age', 'Survived')]
# remove rows with NAs
example <- example[complete.cases(example),]
# changing class to "Yes"/"No" instead of 1/0 in this version of titanic data to make it like OP's data
example$Survived <- ifelse(example$Survived==1,"Yes","No")

# not needed
#example$Class<- as.factor(example$Pclass)
#example$Sex<- as.factor(example$Sex)
# this was an error, age should be a numerical variable
#example$Age<- as.factor(example$Age)

example$Survived<- as.factor(example$Survived)

trainRowNum <- createDataPartition(example$Survived, #The outcome variable
                                   #proportion of example to form the training set
                                   p=0.3,
                                   #Don't store the result in a list
                                   list=FALSE)

# Step 2: Create the training mydataset
trainData <- example[trainRowNum,] 
# Step 3: Create the test mydataset
testData <- example[-trainRowNum,]

mod.surv.lg <- glm(Survived~., family=binomial(), data=trainData)
p <- predict(mod.surv.lg, testData,type="response")

p_class <- ifelse(p > 0.5,"Yes","No")

# ensure compatible factors in both the gold and predicted vectors:
all.as.factors <- as.factor(c(as.character(testData$Survived),as.character(p_class)))
gold.as.factors <- head(all.as.factors, length(testData$Survived))
pred.as.factors <- tail(all.as.factors, length(p_class))

confusionMatrix(p_class, testData$Survived)

I think the problem was the missing as.factors for p_class. Factors in R are not very intuitive, I still get confused frequently myself.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.