Prepare data for SVM, Is it valid to normalise the data before and after PCA dimension reduction
Is it valid to normalise a dataset, reduce dimensionality with PCA and then to normalise the reduced dimension data. Assuming this is performed on training data, should the same PCA coefficients be used to reduce the dimension of the test data. Should the same max and min normalisation values be used for the test and training data. I have included a simplified example of the code I am using which may describe I said better. Thanks in advance.
%% Prepare Training Data
% Normalise training data
mindata=min(TRAINDATA); maxdata=max(TRAINDATA);
TRAINDATA = ((TRAINDATA-repmat(mindata,[size(TRAINDATA,1),1]))./(repmat(maxdata,[size(TRAINDATA,1),1])-repmat(mindata,[size(TRAINDATA,1),1])) - 0.5 ) *2;
% Perform PCA
mTRAINDATA = mean(mean(TRAINDATA));
TRAINDATA = TRAINDATA - mTRAINDATA;
[Cpca,~,~,~,~]=princomp(TRAINDATA,'econ');
EigenRange = 1:2;
Cpca = Cpca(:,EigenRange);
TRAINDATA = TRAINDATA*Cpca;
TRAINDATA = TRAINDATA + mTRAINDATA;
% Normalise training data second time
mindata2=min(TRAINDATA); maxdata2=max(TRAINDATA);
TRAINDATA = ((TRAINDATA-repmat(mindata2,[size(TRAINDATA,1),1]))./(repmat(maxdata2,[size(TRAINDATA,1),1])-repmat(mindata2,[size(TRAINDATA,1),1])) - 0.5 ) *2;
%% Prepare Test Data
% Normalise using first normalisation values from training data
TESTDATA = ((TESTDATA-repmat(mindata,[size(TESTDATA,1),1]))./(repmat(maxdata,[size(TESTDATA,1),1])-repmat(mindata,[size(TESTDATA,1),1])) - 0.5 ) *2;
% Perform PCA
mTESTDATA = mean(mean(TESTDATA));
TESTDATA = TESTDATA - mTESTDATA;
TESTDATA = TESTDATA*Cpca;
TESTDATA = TESTDATA + mTRAINDATA;
% Normalise using second normalisation values from training data
TESTDATA = ((TESTDATA-repmat(mindata2,[size(TESTDATA,1),1]))./(repmat(maxdata2,[size(TESTDATA,1),1])-repmat(mindata2,[size(TESTDATA,1),1])) - 0.5 ) *2;
Topic svm dimensionality-reduction libsvm machine-learning
Category Data Science