Question on ANOVA and Correlation/Association

I've been working on examining statistical relationships between variable:

  1. Pearsons, Spearman's for continuous variables
  2. Kendall's Tau, Cramer's V for ordinal/nominal variables.

I know there's many more ways. Recently I read about ANOVA and hypothesis testing. It seems similar to measuring correlation and association. In fact, I can't tell if it is just another way of doing the same thing, or if it is something entirely different. Most explanations of ANOVA seem a bit more complicated than most explanations of correlation or association.

For example, I know that Pearson's R is a measure of covariance scaled by standard deviation. And ANOVA stands for Analysis Of Variance. So it appears to me that it's the same sort of thing. But I can't tell 100% for sure.

Will someone please shed some light on this technique, what it is used for, and how it contrasts with measuring correlation?

Topic anova statsmodels correlation statistics

Category Data Science


  • About what ANOVA is used for: it can answer whether the difference between the mean values for the data samples I have is due to randomness or is it statistcally significant. Then it is a significance-test that gives you an idea about whether your mean values are (statistically significantly) the same or not. A drawback is that it does not tell you which data sample/s differ from the rest or by how much (useful source). You can think of the process as follows (as described in Practical Statistics for data scientists):
  1. Combine all the data together in a single box
  2. Shuffle and draw out n resamples of m values each (where n is the number of data samples and m the number of data points in each sample)
  3. Record the mean of each of the n groups
  4. Record the variance among the n group means
  5. Repeat steps 2–4 many times (say 1,000) What proportion of the time did the resampled variance exceed the observed variance? This is the pvalue.
  • On the other hand, the direct measure of correlation gives you a number, it tells you by how much two data samples vary linearly along with each other

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.