Parallelization of a MIMO linear filter

I would like to implement a Multi Input Multi Output filtering operation, acting as fast as possible on batches of data. Here is my current implementation:

def lfilter_mimo(b, a, u_in): 
    batch_size, seq_len, in_ch = u_in.shape # [B, T, I]
    out_ch, _, _ = a.shape
    y_out = np.zeros_like(u_in, shape=(batch_size, seq_len, out_ch))
    for out_idx in range(out_ch):
        for in_idx in range(in_ch):
            y_out[:, :, out_idx] += scipy.signal.lfilter(b[out_idx, in_idx, :], a[out_idx, in_idx, :],
                                                         u_in[:, :, in_idx], axis=-1)
    return y_out  # [B, T, O]

For another use case I also need the individual components of the I/O response:

def lfilter_mimo_components(b, a, u_in):
    batch_size, seq_len, in_ch = u_in.shape
    out_ch, _, _ = a.shape
    y_comp_out = np.zeros_like(u_in, shape=(batch_size, seq_len, out_ch, in_ch))
    for out_idx in range(out_ch):
        for in_idx in range(in_ch):
            y_comp_out[:, :, out_idx, in_idx] = scipy.signal.lfilter(b[out_idx, in_idx, :], a[out_idx, in_idx, :], u_in[:, :, in_idx], axis=-1)
    return y_comp_out  # [B, T, O, I]

The implementations above are parallel on the batch index B, but they do require an explicit python loop on input and output channels. I tried to compile with tools like numba and jit without success. I also tried to parallelize the various calls to lfilter with tools like multiprocessing.pool.ThreadPool or joblib.Parallel, but they introduce a certain overhead and this approach seems to be convenient only for pretty large time sequences (T > 512).

Can I do better than that? Should I write my own MIMO version on lfilter? Can I also exploit GPU acceleration?

Thanks!

Topic numpy scipy parallel

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.