Pearson correlation on two categorical variables
I am using the fourth-corner method in one of my papers (for those who need the name). The method was developed to test associations between variables in two datasets. In my case, the datasets contains traits of species (e.g. trait Size with modalities 'small', 'medium', 'large'). The method recognizes the data type and then apply appropriate statistics.
The correct cases:
- If two variables are quantitative, the fourthcorner calculates Pearson correlations.
- If two variables are qualitative, factorial, the method calculates a Chi2.
- If the variables are mix, it calculates a Pseudo-F test.
However, in my study and a study I criticize, we had to convert factorial data into categorical binary data. In my case, instead of having ONE trait column containing small, medium, large, I have 3 columns, small, medium, large coded as factors yes or no. In that other study, they coded as 0/1. If coded as 0/1, the data is first standardized.
If categorical and coded as 0/1 instead of factorial (e.g. biological traits of species, coded as presence/absence 0/1), the method will calculate Pearson correlations instead of Chi2. Or actually, will return Pearson correlations if wrongly specified in the output. Which is what they did and analyze in the study.
I am trying to justify why this is wrong, why you should use categorical data coded as yes-no and do a Chi2, but I am not entirely sure how to explain it, or justify how correlations calculated from the 0/1 do not mean what one thinks.
If we have two categorical variables, height and weight, split in class such as:
height: short, medium, tall (0/1)
weight: thin, large (0/1)
To my understanding: When you calculate correlation between short and thin, you calculate correlations of being short/not short, with thin/not thin as the variables are non-dichotomous.
How would you justify that?
Tables look like this:
for their data
Short | Medium | Tall |
---|---|---|
0 | 1 | 0 |
1 | 0 | 0 |
0 | 0 | 1 |
Thin | Large |
---|---|
0 | 1 |
0 | 1 |
1 | 0 |
for my data
Short | Medium | Tall |
---|---|---|
no | yes | no |
yes | no | no |
no | no | yes |
Thin | Large |
---|---|
no | yes |
no | yes |
yes | no |
Best,