memory error- python N-th order Markovian transition matrix from a given sequence

Ok. What is wrong with you code!

I am trying to calculate transition probabilities for each leg.

The code works for small array but for the actual dataset I got memory error. I have 64 g version python and maximized the memory usage so i believe need help to code efficiently. import numpy as np

# sequence with 3 states - 0, 1, 2

arr = [0, 1, 0, 0, 0, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 2]


def transition_matrix(arr, n=1):

Computes the transition matrix from Markov chain sequence of order `n`.

:param arr: Discrete Markov chain state sequence in discrete time with states in 0, ..., N
:param n: Transition order

M = np.zeros(shape=(max(arr) + 1, max(arr) + 1))
for (i, j) in zip(arr, arr[1:]):
    M[i, j] += 1
T = (M.T / M.sum(axis=1)).T
return np.linalg.matrix_power(T, n)

transition_matrix(arr=a, n=1) # n is the transition order

Again, code works like a charm but when more than 200K array is given memory error occurs.

Topic matrix probability markov-process python

Category Data Science


Ok. I found the problem. I was using very big numbers to represent IDs, instead i replaced them with numbers starting 0 to up. So the above code works like a charm and no memory problem.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.