Why normalization kills my accuracy

Question

Why normalization kills my accuracy

Spring

2022年4月12日 06:23

I have a binary sound classifier. I have a feature set that is extracted from audio with size of 48. I have a model(multi layer neural network) that has around %90 accuracy on test and validation sets. (without normalization or Standardization)

I see that the feature values are mostly around [-10, +10]. But there are certain features with a mean of 4000. Seeing unproportional values within features, I thought some feature scaling might improve things. So using scikit-learn tools I tried the following:

   - Simply removing the means from features
   - Normalizer
   - Min max scaler
   - Robust Scaler

And all these above ended up dropping my accuracy to ~ %50! (%100 recall, %50 precision)

So how is this possible? And what is the correct way to normalize my data?

Topic audio-recognition normalization scikit-learn classification

Category Data Science

Brian Spiering · Accepted Answer · 2019年1月6日 17:33

There could a skewed power envelope or non-stationary data. As a result, off-the-shelf feature scaling could attenuate the signal.

There are feature scaling techniques that tend to work better for audio signals, examples include: RMS level (Root Mean Square Level), Cepstral Mean Subtraction (CMS), RelAtive SpecTrAl (RASTA), kernel filtering, short time gaussianization, stochastic matching, and feature warping.

You should make sure you understand your raw data and the assumptions of each feature scaling technique before application. Accuracy-driven machine learning might lead to the wrong conclusions.

Skiddles · Accepted Answer · 2019年1月6日 17:05

Your results sound like your classifier is not working at all assuming your classes are evenly distributed.

Are you applying the regularization to the entire data set or the fields that have the larger magnitude? If to the whole data set, I would only apply to the fields with the greater magnitude.

While some NNs are sensitive to magnitude differences, I personally don’t find data regularization necessarily that helpful.

If you are looking for ways to improve performance, maybe testing different activation functions would be a good place to start.

Why normalization kills my accuracy

About