Parallelization of a MIMO linear filter
I would like to implement a Multi Input Multi Output filtering operation, acting as fast as possible on batches of data. Here is my current implementation:
def lfilter_mimo(b, a, u_in):
batch_size, seq_len, in_ch = u_in.shape # [B, T, I]
out_ch, _, _ = a.shape
y_out = np.zeros_like(u_in, shape=(batch_size, seq_len, out_ch))
for out_idx in range(out_ch):
for in_idx in range(in_ch):
y_out[:, :, out_idx] += scipy.signal.lfilter(b[out_idx, in_idx, :], a[out_idx, in_idx, :],
u_in[:, :, in_idx], axis=-1)
return y_out # [B, T, O]
For another use case I also need the individual components of the I/O response:
def lfilter_mimo_components(b, a, u_in):
batch_size, seq_len, in_ch = u_in.shape
out_ch, _, _ = a.shape
y_comp_out = np.zeros_like(u_in, shape=(batch_size, seq_len, out_ch, in_ch))
for out_idx in range(out_ch):
for in_idx in range(in_ch):
y_comp_out[:, :, out_idx, in_idx] = scipy.signal.lfilter(b[out_idx, in_idx, :], a[out_idx, in_idx, :], u_in[:, :, in_idx], axis=-1)
return y_comp_out # [B, T, O, I]
The implementations above are parallel on the batch index B, but they do require an explicit python loop on input and output channels. I tried to compile with tools like numba and jit without success. I also tried to parallelize the various calls to lfilter with tools like multiprocessing.pool.ThreadPool or joblib.Parallel, but they introduce a certain overhead and this approach seems to be convenient only for pretty large time sequences (T > 512).
Can I do better than that? Should I write my own MIMO version on lfilter? Can I also exploit GPU acceleration?
Thanks!
Category Data Science