Multiple Hypotheses in python

Question

Multiple Hypotheses in python

stacky

2021年7月29日 17:04

I want to write a method to test multiple hypotheses for a pair of schools (say TAMU and UT Austin). I want to consider all possible pairs of words (Research Thesis Proposal AI Analytics), and test the hypothesis that the words counts differ significantly across the two schools, using the specified alpha (0.05) threshold.

Only need to conduct tests on words that have non-zero values for both schools. I.e., every row and column in the contingency table should sum to 0.

Finally, want to return a tuple with the

The total number of tests conducted, and
The number of significant tests.

Sample data frame:

Names	Research	Thesis	Proposal	AI	Analytics Data
TAMU	54	0	0	6	5
uiuc	33	43	5	0	76
USC	4	1	0	7	21
UT Austin	22	31	0	0	55
UCLA	55	6	7	9	11

from scipy.stats import chi2_contingency
def school_term_hypotheses(filename,college1, college2, alpha):
   
   df=pd.read_csv(filename)
   df=df[(df['Name'] == college1) | (df['Name'] == college2)]
   df=df.loc[:, df.ne(0).all()]
   df=df.set_index('Unnamed: 0')
   #chi,p=chi2_contingency(df)[:2]
   #return(p)

school_term_hypotheses(test.csv, 'TAMU','UT Austin' 0.05)

I am clueless about what to do after getting a df with non-zero values. need some help figuring how do I test multiple hypotheses.

Topic chi-square-test pvalue scipy machine-learning

Category Data Science

SrJ · Accepted Answer · 2021年7月26日 05:47

Try the following code

from scipy import stats
def chi_squared_test(df,college1,college2,alpha):
    contingency_table = pd.crosstab(df.loc[college1,:],df.loc[college2,:])
    try:
        stat,p,dof,expected = stats.chi2_contingency(contingency_table)
    except:
        return None
    return p

Multiple Hypotheses in python

About