What parameters to use when normalising training, validation, and testing data?

Question

What parameters to use when normalising training, validation, and testing data?

Rocky the Owl

2022年5月19日 19:02

I know a similar post was made here, but I wanted to ask some follow up questions. I am conducting a cross-validation search to find values of a set of hyper-parameters and need to normalise the data.

If we split up the data as follows:

'Training' (call this set 'A' for now) and testing data
Split the 'training' into training (call this set 'B' for now) and validation sets

what parameters should be used when normalising the datasets?

Am I correct in thinking that:

We normalise dataset 'B' and then extract the means and standard deviations on it
We then normalise the validation set using those parameters obtained from set 'B'
Once we have used the validation set to find my hyper-parameters with cross-validation, then we normalise set 'A' and extract its parameters
Use the parameters from set 'A' to normalise the testing set

Is this correct, or have I misunderstood something? I know this is basic, but I can't seem to find a straightforward answer to this anywhere?

Topic training normalization cross-validation python

Category Data Science

Ethan · Accepted Answer · 2020年12月4日 18:53

I am not exactly sure what you mean by "what parameters should be used when normalizing datasets."

However, it is important to note:

Normalization is a preprocessing step that you do to some or all of the parameters of your model before constructing the model.

But in answer to your question:

You always normalize the same parameters used in both the train and the test set (otherwise how would you be able to compare the results?).

What parameters to use when normalising training, validation, and testing data?

About