Chi-square test - how can I say if attributes are correlated?

I am experimenting a course's teorical contents on this dataset. After data cleaning, I am trying to use chi-square test. I wrote the following code:

chisq.test(chocolate$CompanyMaker, chocolate$Rating, simulate.p.value = TRUE)
chisq.test(chocolate$SpecificBeanOriginOrBarName, chocolate$Rating, simulate.p.value = TRUE)
chisq.test(chocolate$CompanyLocation, chocolate$Rating, simulate.p.value = TRUE)
chisq.test(chocolate$BeanType, chocolate$Rating, simulate.p.value = TRUE)
chisq.test(chocolate$BroadBeanOrigin, chocolate$Rating, simulate.p.value = TRUE)

chisq.test(chocolate$CompanyMaker, chocolate$CocoaPerc, simulate.p.value = TRUE)
chisq.test(chocolate$SpecificBeanOriginOrBarName, chocolate$CocoaPerc, simulate.p.value = TRUE)
chisq.test(chocolate$CompanyLocation, chocolate$CocoaPerc, simulate.p.value = TRUE)
chisq.test(chocolate$BeanType, chocolate$CocoaPerc, simulate.p.value = TRUE)
chisq.test(chocolate$BroadBeanOrigin, chocolate$CocoaPerc, simulate.p.value = TRUE)

And these are my results:

RATING

  • CompanyMarker = 0.29

  • Specific... = 0.6267

  • CompanyLocation = 0.1819

  • BeanType = 0.5372

  • BroadBeanOrigin = 0.1534

COCOA PERC

  • CompanyMarker = 0.0004998
  • Specific... = 0.902
  • CompanyLocation = 0.04748
  • BeanType = 0.8136
  • BroadBeanOrigin = 0.8356

Online, I read about significance level, but i didn't quite understand it. In particular, is it at 0.5 or 0.05? Which values are "ok"?

From what I understood, I should say that CompanyMarker, CompanyLocation and BroadBeanOrigin are related to Rating, while CompanyMarker and CompanyLocation are related to cocoaPercent.

Is this right? If not, can you write or link me an example or a guide to do it right? Thanks.

Topic chi-square-test correlation

Category Data Science


Chi-Square is used to determine which of the attributes are most informative. Its used in feature Selection.

So, if you have an attribute A ,B and C and output Y, we are trying to know Y depends on A or B or C ? A or B or C might be independent also i.e. has no affect on output Y.

So Chi-Sqaure is a statistical test to find out which attribute is independent and can be removed.

A contingency tables is created for each attribute value and its frequencies/occurrence is recorded and p_values above/below threshold determines if its relevant or not.

More about it here- https://machinelearningmastery.com/chi-squared-test-for-machine-learning/

Online, I read about significance level, but i didn't quite understand it. In particular, is it at 0.5 or 0.05? Which values are "ok"?

Please read about p-value

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.