Some ML algorithms require standardisation of data and some work better with standardisation. In the case of neural nets (NN), standardisation often improves performance since NN can have a hard time dealing with "very different" scales.
What you can do is to standardise each $x$ column to have mean 0 and standard deviation (sd) 1 (substract mean and divide by sd). You do this based on the training data. In order to make predictions (e.g. on a test set or new data), you also apply the original data transformation to the test or new data. Make sure the same transformation is applied to the new data.
The reason why you use only the training data to get the mean/sd for standardisation is to avoid data leakage. So do not pass any information about the test data into the training process.
After standardisation, the scale of each variable is transformed. E.g. when you had "apples" in the first place, you have "standard deviations" after standardisation. This can sometimes be relevant, e.g. when you want to do statistical inference (instead of prediction).
Standardisation in Python:
# Get mean and SD from train data
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
# Standardise data
train_data -= mean
train_data /= std
test_data -= mean
test_data /= std
There are also other ways to "rescale" your data, e.g. min-max scaling, which also often works well with NN. The different ways/terms are well described on Wikipedia.
Brief example in R:
The vector apples
has one extreme value. After standardisation, the new vector apples_st
has a mean of (almost) zero and sd equal to 1. When you look at the new vector apples_st
, you see that the extreme value was smoothed out.
apples = c(1,2,3,4,5,100)
apples_st = (apples - mean(apples)) / sd(apples)
mean(apples_st)
[1] -9.251859e-18
sd(apples_st)
[1] 1
apples_st
[1] -0.4584610 -0.4332246 -0.4079882 -0.3827518 -0.3575154 2.0399410