Slightly different results between scipy.stats.spearmanr and manual calculation
I have the following dataset.
When I calculate the Spearman correlation coefficient with scipy.stats.spearmanr
, it returns 0.718182.
import pandas as pd
import numpy as np
from scipy.stats import spearmanr
df = pd.DataFrame(
[
[7,3],
[6,5],
[5,4],
[3,2],
[6,4],
[8,9],
[9,7]
],
columns=['Set of A','Set of B'])
correlation, pval = spearmanr(df)
print(f'correlation={correlation:.6f}, p-value={pval:.6f}')
It returns this:
correlation=0.718182, p-value=0.069096
However, when I tried to calculate it manually:
df_rank = pd.DataFrame(
[
[5,2],
[3.5,4],
[2,4],
[1,1],
[3.5,4],
[6,7],
[7,6]
],
columns=['Rank of A','Rank of B'])
cov_rank=np.cov(df_rank.iloc[:,0],df_rank.iloc[:,1])[0][1]
cov_rank/(df_rank.std()[0]*df_rank.std()[1])
It returns a different value.
0.7105597124064275
After the two decimal points are different and I do not know why.
The question is if scipy.stats.spearmanr
expect the data to be ranked or not.
Topic spearmans-rank-correlation
Category Data Science