Binary Logistic Regression in R on the dataset, Titanic
I am new to R and Model Learning Algorithm. I am trying to perform Binary Logistic Regression on the training set using the Titanic dataset which is provided by default from R. I am running the algorithm on the set with the variable, Survived as the outcome variable. The variable, Survived contains Yes and No values. I am splitting the dataset into two set, training(40) and test(60). The data look like this below, Titanic Data
#Binary Logistic Regression
#Import dataset, Titanic
data(Titanic)
#Load data to the example as data.frame
example- as.data.frame(Titanic)
#Add a new column, Country to determine on where they are born
example['Country'] - NA
#Declare a vector of unique country
countryunique - array(c(Africa,USA,Japan,Australia,Sweden,UK,France))
#Declare an empty vector
new_country - c()
#Perfor looping through the column, Country
for(loopitem in example$Country)
{
#Perform random selection of an array, countryunique
loopitem - sample(countryunique, 1)
#Load the new value to the vector
new_country- c(new_country,loopitem)
}
#Override the Country column with new data
example$Country- new_country
#Convert the column to factor but the Freq as numeric
example$Class- as.factor(example$Class)
example$Sex- as.factor(example$Sex)
example$Age- as.factor(example$Age)
example$Survived- as.factor(example$Survived)
example$Country- as.factor(example$Country)
example$Freq- as.numeric(example$Freq)
#Split the dataset to training and test set.
set.seed(20)
sample_size - floor(0.6 * nrow(example))
test_index - sample(seq_len(nrow(example)), size = sample_size)
#Load data into test for 60 percentage
test - example[test_index,]
#Load data into training for 40 percentage
training - example[-test_index, ]
#Logistic regression modelling
mod.lg - glm(Survived~., family=binomial(), data=training);
#Provide the summary of the model
summary(mod.lg)
The summary of the model is shown below.
Call:
glm(formula = Survived ~ ., family = binomial(), data = training)
Deviance Residuals:
1 4 5 7 10 12
15 16 21 22 23 26 30
-0.0000040454 -0.0000024660 -0.0000104674 -0.0000024921 -0.0000107568 -0.0000000211
-0.0000000211 -0.0000053423 0.0000107568 0.0000041004 0.0000005560 0.0000103920
0.0000024086
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(|z|)
(Intercept) 48.8492 822876.6829 0.000 1
Class2nd -43.8783 1592352.1656 0.000 1
Class3rd -39.2030 351691.5041 0.000 1
ClassCrew -75.3682 822888.6960 0.000 1
SexFemale -24.5969 819055.3208 0.000 1
AgeAdult 76.0607 827305.0519 0.000 1
Freq -0.6793 1165.4986 -0.001 1
CountryAustralia -74.3782 849754.8545 0.000 1
CountryFrance 24.4715 895175.9026 0.000 1
CountrySweden -47.8800 115169.7337 0.000 1
CountryUK 53.9582 1576877.4347 0.000 1
CountryUSA NA NA NA NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 17.3232395027816 on 12 degrees of freedom
Residual deviance: 0.0000000005291 on 2 degrees of freedom
AIC: 22
Number of Fisher Scoring iterations: 25
I want to know on whether I am on the correct path to implementing Binary Logistic Regression on the dataset, Titanic and noticed that the result of the summary of the model contain many 0.000 on the third column.. How to fix this issue? How to interpret the summary of the model?
Thank you.
Topic rstudio regression logistic-regression r
Category Data Science