Slightly different results between scipy.stats.spearmanr and manual calculation

Question

Slightly different results between scipy.stats.spearmanr and manual calculation

shin

2020年2月15日 15:51

I have the following dataset. When I calculate the Spearman correlation coefficient with scipy.stats.spearmanr, it returns 0.718182.

import pandas as pd
import numpy as np
from scipy.stats import spearmanr

df = pd.DataFrame(
    [
        [7,3],
        [6,5],
        [5,4],
        [3,2],
        [6,4],
        [8,9],
        [9,7]
    ],
    columns=['Set of A','Set of B'])

correlation, pval = spearmanr(df)
print(f'correlation={correlation:.6f}, p-value={pval:.6f}')

It returns this:

correlation=0.718182, p-value=0.069096

However, when I tried to calculate it manually:

df_rank = pd.DataFrame(
    [
        [5,2],
        [3.5,4],
        [2,4],
        [1,1],
        [3.5,4],
        [6,7],
        [7,6]
    ],
    columns=['Rank of A','Rank of B'])
cov_rank=np.cov(df_rank.iloc[:,0],df_rank.iloc[:,1])[0][1]

cov_rank/(df_rank.std()[0]*df_rank.std()[1])

It returns a different value.

0.7105597124064275

After the two decimal points are different and I do not know why.

The question is if scipy.stats.spearmanr expect the data to be ranked or not.

Topic spearmans-rank-correlation

Category Data Science

Sean Owen · Accepted Answer · 2020年2月15日 15:51

I think you have a small error in your manual calculation. You assign rank 4 to 4, 4, and 5. The first two should have rank 3.5 and the last should be rank 5. Your calculation then gives the same answer, 0.7181818181818181

Slightly different results between scipy.stats.spearmanr and manual calculation

About