Which method to use to remove correlation between independent variables comprising of both categorical and numerical variables?
The independent variables in the dataset contains categorical variables such as
- Gender ( 2 levels)
- Mode of Shipment ( 3 levels)
- Product Importance ( 4 levels)
and Numerical Variables such as
- Customer care calls
- Discount Offered
- Package weight
How do I find the correlations between these variables?
- Converting categorical variables in to dummy variables and then using pearson correlation? What if the dummy variable categories also shows correlations too? such as correlation between Mode of shipment categories, Flight, ship, road? Do I need to remove the highly correlated dummy variable category with the other mode of shipment category? or
- doing separate correlations between numerical variables using pearson correlation, and for categorical variables using chi sq statistics?
How to go about it?
Thank you! It's a long question, but really need this clarity. Would appreciate any additional links too. Thanks again!
Topic pearsons-correlation-coefficient correlation
Category Data Science