How to normalize test data according to the training data if the normalization on the training data is performed row wise?

Question

How to normalize test data according to the training data if the normalization on the training data is performed row wise?

jerry

2022年4月26日 08:06

I read in several places about the normalization of features in the machine learning method. But I normalize my training data row-wise as shown in the following code. I showed only two samples of training data. My question is that while performing the normalization on test data, should I choose the minimum and maximum value of each test sample to normalize each test data, or should I uses the minimum and maximum values from the training data? As an explanation in the first row -3 is one feature, -2 is second 0 is third and 3 is the fifth feature. And the second row is the second sample comprising of 5 features from -4 to 2. Similar to all other machine learning algorithms each row corresponds to one sample consisting of 5 features.

data = np.array([[-3,-2,0, 2,3],[-4,-1,0,3,2]])
print(data)

print(data.shape)
for i in range(len(data)):
    print(i: ,i)
    old_range = np.amax(data[i]) - np.amin(data[i])
    new_range = 2 
    new_min = -1    
    data_norm = ((data[i] - np.amin(data[i])) / old_range)*new_range + 
new_min
print(data_norm)

Result

[-1.         -0.66666667  0.          0.66666667  1.        ]
[-1.         -0.14285714  0.14285714  1.          0.71428571]

Topic normalization machine-learning

Category Data Science

Viktor · Accepted Answer · 2021年7月16日 07:37

You usually want to normalize features as you also pointed out. In case of tabular data, almost every machine learning implementation will expect you to provide the features as columns and observations as rows. In your case, if you have a feature in a row you may want to transpose it or if they are not the same features than you may apply a different transformation.

If you do normalization on features you have to use the same transformation on test data that you used on train data. (If you use i.e. scikit-learn implementations they will take care of it for you).

How to normalize test data according to the training data if the normalization on the training data is performed row wise?

About