Is it wise to always `StandardScaler()` features? [SOLVED]

My current investigations point to the sklearn.preprocessing.StandardScaler() not always being the right choice for certain types of feature extractions for neural networks.

Suppose I want to classify sound events based on spectrogram data. A second of such data could look like this:

Visible here is a sine wave of around 1kHz over one second. The settling of the low bands is specific to the feature extraction and not part of the question. The data is a (n,28,40) matrix of dBFS-values, ergo the logarithmic energy levels relative to the maximal digital headroom of the wav-file.

If StandardScaler is now applied, the visual representation of the sound now looks like this:

... which basically removes the defining features and amplifies the noise, exactly what is NOT wanted. Would a level-based scaler be the better choice here or is the StandardScaler() just not appearing to benefit the system in this specific case of a sine wave?

Note: I am a student and I do not have years of experience. So if the question lacks quality, I ask of you to suggest an improvement before downvoting. Thank you.

Topic feature-scaling feature-extraction python

Category Data Science


By the looks of the image, you seem to be training the Scaler for each line of the graph.

Almost every line of the first graph keeps the same values along the 0-1s duration. If you run the scaler for each line, then you end up with 0 as a result for every pixel, because almost all pixels have the same value that is also the mean. Also note that the only pixel that is different is the leftmost pixel, which is the pixel that differs the most.

What you should be doing instead is lining up all the pixels of the matrix in one single column and scaling the whole data at once, then reshape the column back to your matrix form, then you won't see that much of a difference between the two images.


I haven't worked with this kind of data before, but are you applying the Standard Scaler to logarithmic data? In the second image, I can see that your color range for dBFS values goes from 6 to -6, and that makes no sense at all because dBFS uses 0 as maximum level (as seen in the color scale of the first image).

I believe you would first need to transform the data into absolute values like a dBFS to 16 bit values conversion, making your data linear so that you can use the Standard Scaler. However, note that if you use this scaler, you will get negative and positive values, therefore going back to the dBFS notation doesn't make much sense, because that notation is for absolute values only.

About the conversion, since dBFS can only represent absolute values, the result is unsigned data ranging from 0 to 65535. If you want the standard signed 16-bit representation, you would then need to rescale the data ranging from -32768 to 32767.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.