What parameters to use when normalising training, validation, and testing data?
I know a similar post was made here, but I wanted to ask some follow up questions. I am conducting a cross-validation search to find values of a set of hyper-parameters and need to normalise the data.
If we split up the data as follows:
- 'Training' (call this set 'A' for now) and testing data
- Split the 'training' into training (call this set 'B' for now) and validation sets
what parameters should be used when normalising the datasets?
Am I correct in thinking that:
- We normalise dataset 'B' and then extract the means and standard deviations on it
- We then normalise the validation set using those parameters obtained from set 'B'
- Once we have used the validation set to find my hyper-parameters with cross-validation, then we normalise set 'A' and extract its parameters
- Use the parameters from set 'A' to normalise the testing set
Is this correct, or have I misunderstood something? I know this is basic, but I can't seem to find a straightforward answer to this anywhere?
Topic training normalization cross-validation python
Category Data Science