Optimizing rolling Pearson correlation
I have Pandas DataFrame with multiple columns (3000 or more) with timeseries in them (I have dates as indecies).
|id1 id2 id3
-------------------------------
2021-01-06 | 27 29 5
2021-01-07 | 24 20 9 ...
2021-01-08 | 21 13 14
2021-01-09 | 10 6 24
...
And I need to do rolling window computations of Pearson correlation on each pair of columns. I'm using multiprocessing and regular pandas.DataFrame.corr()
function and it take days to complete the calculation. Is it possible to speed up that process without doing cloud computations?
Topic pearsons-correlation-coefficient pandas python
Category Data Science