Optimizing rolling Pearson correlation

Question

Optimizing rolling Pearson correlation

Дмитрий Сажнев

2022年2月8日 09:13

I have Pandas DataFrame with multiple columns (3000 or more) with timeseries in them (I have dates as indecies).

            |id1    id2     id3  
-------------------------------
2021-01-06  | 27    29      5
2021-01-07  | 24    20      9   ...
2021-01-08  | 21    13      14
2021-01-09  | 10    6       24
                ...

And I need to do rolling window computations of Pearson correlation on each pair of columns. I'm using multiprocessing and regular pandas.DataFrame.corr() function and it take days to complete the calculation. Is it possible to speed up that process without doing cloud computations?

Topic pearsons-correlation-coefficient pandas python

Category Data Science

Optimizing rolling Pearson correlation

About