Multiple Hypotheses in python

I want to write a method to test multiple hypotheses for a pair of schools (say TAMU and UT Austin). I want to consider all possible pairs of words (Research Thesis Proposal AI Analytics), and test the hypothesis that the words counts differ significantly across the two schools, using the specified alpha (0.05) threshold.

Only need to conduct tests on words that have non-zero values for both schools. I.e., every row and column in the contingency table should sum to 0.

Finally, want to return a tuple with the

  • The total number of tests conducted, and
  • The number of significant tests.

Sample data frame:

Names Research Thesis Proposal AI Analytics Data
TAMU 54 0 0 6 5
uiuc 33 43 5 0 76
USC 4 1 0 7 21
UT Austin 22 31 0 0 55
UCLA 55 6 7 9 11
from scipy.stats import chi2_contingency
def school_term_hypotheses(filename,college1, college2, alpha):
   
   df=pd.read_csv(filename)
   df=df[(df['Name'] == college1) | (df['Name'] == college2)]
   df=df.loc[:, df.ne(0).all()]
   df=df.set_index('Unnamed: 0')
   #chi,p=chi2_contingency(df)[:2]
   #return(p)

school_term_hypotheses(test.csv, 'TAMU','UT Austin' 0.05)

I am clueless about what to do after getting a df with non-zero values. need some help figuring how do I test multiple hypotheses.

Topic chi-square-test pvalue scipy machine-learning

Category Data Science


Try the following code

from scipy import stats
def chi_squared_test(df,college1,college2,alpha):
    contingency_table = pd.crosstab(df.loc[college1,:],df.loc[college2,:])
    try:
        stat,p,dof,expected = stats.chi2_contingency(contingency_table)
    except:
        return None
    return p 

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.