p-value of chi squared test is exactly 0.0

I need to do a chi square test of two of my dataset's categorical variables. This two variables have basically the same meaning but comes from two different sources, so my idea is to use a chi square test to see how similar or correlated, these two variables really are. To do so, I've written code in Python, but the p-value I get from it is exactly 0 which sounds a little strange to me.

the code is:

from scipy.stats import chi2_contingency
import pandas as pd

df = pd.read_csv('data/data_understanding_output.csv')

cont = pd.crosstab(df['sentiment'], df['valence_cat'])
c,p,dof,ex = chi2_contingency(cont)

My contingency table is:

Class 0 Class 1 Class 2
Class 0 315 37 2
Class 1 665 2661 665
Class 2 3 49 285

And the trying to output like this my results I get:

print(f{c}\n{p}\n{dof}\n{ex})

1954.0385481800377
0.0
[[  74.32336608  207.69713798   71.97949594]
 [ 837.92246903 2341.57988039  811.49765058]
 [  70.75416489  197.72298163   68.52285348]]

4

So my question is, Did I do anything wrong? Is it normal to have p-value that equals to absolute zero ?

Topic chi-square-test pvalue scipy python

Category Data Science


Your results are based on cross tabulation of three categories. You have a single variable with three categories.There should be one-way tabulation in your contingency table. Re-write your contingency table and then compute p-value. It is unlikely to be close to zero.


P Value of 0 is rare but theoretically possible. However in reality, p value can very rarely be zero. Any data collected for some study are certain to be suffered from error at least due to chance (random) cause. Accordingly, for any set of data, it is certain not to obtain "0" p value. However, p value can be very small in some cases.

Lets look at the interprations: The p-value is the probability of getting an outcome as extreme or more extreme than the observed outcome, ASSUMING THE NULL HYPOTHESIS IS TRUE. If the p-value is small, this weighs against the null hypothesis, because it says that the observed outcome is quite rare, and therefore unlikely. A large value for the p-value weights in favor of the null hypothesis, because it says that the observed outcome is pretty much what the null hypothesis said you would see.

So in your case a very small p values indicate, strong reason to reject Null Hypothesis


Is it normal to have p equals to absolute zero?

I don't know about "normal", however it is completely possible, and in your case it makes sense, your frequencies are vastly different between the classes, so one would expect this result to be extremely unusual.

I'll repeat this test in R

ct=rbind(
  c(315,37,2),
  c(665,2661,665),
  c(3,49,285)
)

chisq.test(ct)

    Pearson's Chi-squared test

data:  ct
X-squared = 1954, df = 4, p-value < 2.2e-16

same result, a p-value of practically 0.

Note: the Chi square test has some assumptions, one of them being (a rule of thumb)

  • No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.