Does one-hot encode effects chi-square test?
I am doing a feature selection for a data science project with one of those feature being a high cardinality categorical variable (for context, it’s nationality). I know chi-square test could handle multiclass feature like mine but I need to do one-hot encode (dividing a multiclass variable into multiple binary variable based on its values) to be able to input it into my machine learning algorithm (spark mllib). My question is does doing one-hot encode effects the result of a chi-square test? I think it will because the values of one are based on all the other values is it not? Sorry for the harsh english and thank you in advance.
Topic chi-square-test one-hot-encoding pyspark
Category Data Science