pass variable length argument to mstats.kruskalwallis
I am trying to run kruskawallis test on multiple columns of my data for that i wrote an function
var=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
def kruskawallis_test(column):
k_test=train.loc[:,[column,'SalePrice']]
x=pd.pivot_table(k_test,index=k_test.index, values='SalePrice',columns=column)
for i in range(x.shape[1]):
var[i]=x.iloc[:,i]
var[i]=var[i][~var[i].isnull()].tolist()
H, pval = mstats.kruskalwallis(var[0],var[1],var[2],var[3])
return pval
the problem i am facing is every column have a different number of groups so var[0],var[1],var[2],var[3] will not be correct for every column. mstats.kruskalwallis() take input vector which contain values of each group to be compared from a particular column.(as per my knowledge).
is there a better way to do this?
or what can i do pass different number of variable for every column for example:
if a column x have a, b, c, d, e levels how can i pass 5 vectors?
Topic anova non-parametric statistics machine-learning
Category Data Science