What ML architecture fits fixed length signal regression?
My problem is of regression type -
How to estimate a fish weight using a fixed-length signal (80 data points) of the change in resistance when the fish swim through a gate with electrodes (basically 4 seconds of the fish passing at 20Hz logger)?
It is a spike shaped-signal, height and width depend on the size of the fish, its speed and proximity to the edges of the gate and probably other things like the water salinity and temperature.
I have a data set of 15 different weights, each with 20-110 samples, each with 2 spikes for the 2 sets of electrodes I use for measurement (using 2 sets can help determine where the fish is heading).
Here is an example of resistance readout of a 340-gram fish experiment:
And here is an example of the extracted spikes from the same 340-gram fish experiment:
As you can see, there is a significant variance, which led me to look for a Neural Network approach that can get such signal as an input and estimate the fish weight.
Do you know of such a State of the Art network that does that?
What would you try?
Maybe a different ML technique?
Thanks!
Edit:
The data presented is post-processing, I extract the spikes using this python code (attached) so some of the noise is cleaned. I'm not sure as to how to clean it any better since the experimenter didn't record when a fish goes through the gate - all we have is the electrodes signal to deduce that a fish passed through.
# extracting the spikes 
def get_spikes(data_series_elc1, data_series_elc2, signal_meta):
    window_size = int(signal_meta['freq'])*4
    half_window = int(window_size/2)
    
    std = np.std(data_series_elc1)
    p10 = np.quantile(data_series_elc1, 0.9)
    spikes = []
    i = 0
    while i  len(data_series_elc1)-half_window:
        if data_series_elc1[i]  p10:
            #find next max to fix as the center
            max_indx = np.argmax(data_series_elc1[i:i+window_size]) half_window:i+max_indx+half_window])
            spike_list = [[data_series_elc1[i+max_indx-half_window:i+max_indx+half_window]],[data_series_elc2[i+max_indx-half_window:i+max_indx+half_window]]]
            if len(spike_list[0][0])==window_size:                
                spikes.append(spike_list) 
            
            i = i+max_indx+half_window
        else:        
            i = i+1
    print('Number of Spikes: ',len(spikes))
    return spikes
Also, I extract features like max, width, integral, and Gaussian fit but a linear regression model only gets me ~R^2=0.6 = a mean error of ~103 gram overall fish [100., 144., 200., 275., 339., 340., 370., 390., 400., 404., 480., 500., 526., 700., 740., 800., 840.], which is quite a large error.
Vanilla's fully connected neural network gets about the same.
model = keras.Sequential()
model.add(keras.Input(shape=(80,)))
model.add(layers.Dense(40, activation=relu))
model.add(layers.Dense(10, activation=relu))
model.add(layers.Dense(1))
So I'm looking to improve these results, any ideas?
Topic neural machine-learning-model regression neural-network machine-learning
Category Data Science

