Right order for Data preparation in Machine Learning
For the below mentioned steps of data preparation
- Outlier detection/treatment
- Data imputation
- Data scaling/standardisation
- Class balancing
There are two sub questions
- Should each of these steps performed post test/train split?
- Should it be done on test data?
Would appreciate explanation for each step individually.
Topic data-imputation feature-scaling outlier class-imbalance machine-learning
Category Data Science