Why normalization kills my accuracy
I have a binary sound classifier. I have a feature set that is extracted from audio with size of 48. I have a model(multi layer neural network) that has around %90 accuracy on test and validation sets. (without normalization or Standardization)
I see that the feature values are mostly around [-10, +10]. But there are certain features with a mean of 4000. Seeing unproportional values within features, I thought some feature scaling might improve things. So using scikit-learn tools I tried the following:
- Simply removing the means from features
- Normalizer
- Min max scaler
- Robust Scaler
And all these above ended up dropping my accuracy to ~ %50! (%100 recall, %50 precision)
So how is this possible? And what is the correct way to normalize my data?
Topic audio-recognition normalization scikit-learn classification
Category Data Science