Is pearson correlation matrix a good indicator for label encoded categorical and numeric independent data?

Question

Is pearson correlation matrix a good indicator for label encoded categorical and numeric independent data?

Echo

2022年2月25日 22:01

I have a dataset having 22 independent variables out of which 15 are categorical data that has already been label encoded i.e the dtype is int64 and the contents are in a range of 0 to n (n is the number of distinct classes). I got the data in this form and didnot have to encode it.

Since, the data has been already encoded I can directly use python pearson's correlation to get the correlation matrix for all combinations (encoded-encoded, continous-encoded, continous-continous).

I wanted to know if this is the correct way of handling this situation or should I look into different correlation for each individual group (encoded-encoded, continous-continous and encoded-correlation).

If so, what should be the metrics I should use.

Minimal Working Example

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dict = {'Age': [50, 23, 45, 10], 'Weight': [85, 50, 70, 35], 'Gender': ['M','F','M','F'], 'Home':['US','JPN','US','Ger']}
df = pd.DataFrame.from_dict(dict)
df['Gender'] = df['Gender'].astype('category')
df['Gender'] = df['Gender'].cat.codes
df['Home'] = df['Home'].astype('category')
df['Home'] = df['Home'].cat.codes

corr = df.corr()
sns.heatmap(corr, square = True,annot=True, fmt='.2f')
plt.show()

This gives

A beginner into this field.

Topic pearsons-correlation-coefficient correlation scikit-learn pandas

Category Data Science

Is pearson correlation matrix a good indicator for label encoded categorical and numeric independent data?

About