Using PCA for Dimensionality Expansion
I was trying to use t-SNE algorithm for dimensionality reduction and I know this was not the primary usage of this algorithm and not recommended. I saw an implementation here. I am not convinced about this implementation on t-SNE.
The algorithm works like this:
- Given a training dataset and a test dataset, combine the 2 together into one full dataset
- Run t-SNE on the full dataset (excluding the target variable)
- Take the output of the t-SNE and add it as K new columns to the full dataset, K being the mapping dimensionality of t-SNE.
- Re-split the full dataset into training and test
- Split the training dataset into N folds
- Train your machine learning model on the N folds and doing N-fold cross-validation
- Evaluate the machine learning model on the test dataset
My main questions are not about the t-SNE but;
- Can I use this algorithm below for other dimensionality reduction algorithms such as PCA by splitting dataset into train and test sets before transforming the data?
- Would this be effective?
Dimensionality is not a problem for my dataset because it is already a small one. Having highly correlated features also not important.
Topic pca dimensionality-reduction
Category Data Science