What does it mean to have 1 degree of freedom in ANOVA test?
So I used python to run multi-factorial ANOVA analysis on a data set. I first used a ols.fit() and then the anova_lm function. I realized for the variables I am analyzing their degree of freedom is 1. Does that mean only 1 value out of my data is extracted and used for calculation? Why is the residual df so high?
import pandas as pd
from statsmodels.multivariate.manova import MANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.graphics.factorplots import interaction_plot
import matplotlib.pyplot as plt
from scipy import stats
#Some quick df transformation
#We want to analyze the candy sales numbers with respect to the flavors
formula = 'Sales ~ Mango+Raspberry+Chocolate+Dark_Chocolate+Ice_cream+Cherry'
model = ols(formula, df).fit()
aov_table = anova_lm(model_MS_ro, typ=2)
print(aov_table)
****ANOVA Results****
df sum_sq mean_sq F PR(F)
Mango 1.0 0.008512 0.008512 2.325284 0.130999
Raspberry 1.0 0.006025 0.006025 1.645954 0.202998
Chocolate 1.0 0.049506 0.049506 13.524418 0.000412
Dark_Chocolate 1.0 0.007233 0.007233 1.976095 0.163447
Ice_cream 1.0 0.018032 0.018032 4.926093 0.029117
Cherry 1.0 0.024460 0.024460 6.682116 0.011444
Residual 85.0 0.311140 0.003660 NaN NaN
Category Data Science