Which stage should the correlation analysis be done?

I was thinking about it, but I couldn't find a logical explanation.

Mostly im following below steps after data become ready:

  • Correlation analysis and elimination
  • Apply dummy if categorical variables exist
  • Balance the data if data is unbalanced
  • Scale data
  • Feature selection (Backward, Stepwise etc.)
  • Train model

Where would the correlation analysis be applied for this path I followed would make more sense? After the data is balanced? After scaling? Or at first?

Topic correlation feature-selection

Category Data Science


Correlation is a bivariate feature analysis technique.

Typically, that is done after univariate features analysis. But before any feature engineering.

Most machine learning is iterative so correlation can be revisited at any stage.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.