Is pearson correlation matrix a good indicator for label encoded categorical and numeric independent data?
I have a dataset having 22 independent variables out of which 15 are categorical data that has already been label encoded i.e the dtype is int64 and the contents are in a range of 0 to n (n is the number of distinct classes). I got the data in this form and didnot have to encode it.
Since, the data has been already encoded I can directly use python pearson's correlation
to get the correlation matrix for all combinations (encoded-encoded, continous-encoded, continous-continous).
I wanted to know if this is the correct way of handling this situation or should I look into different correlation for each individual group (encoded-encoded, continous-continous and encoded-correlation).
If so, what should be the metrics I should use.
Minimal Working Example
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dict = {'Age': [50, 23, 45, 10], 'Weight': [85, 50, 70, 35], 'Gender': ['M','F','M','F'], 'Home':['US','JPN','US','Ger']}
df = pd.DataFrame.from_dict(dict)
df['Gender'] = df['Gender'].astype('category')
df['Gender'] = df['Gender'].cat.codes
df['Home'] = df['Home'].astype('category')
df['Home'] = df['Home'].cat.codes
corr = df.corr()
sns.heatmap(corr, square = True,annot=True, fmt='.2f')
plt.show()
This gives
A beginner into this field.
Topic pearsons-correlation-coefficient correlation scikit-learn pandas
Category Data Science