How to test likelihood hypothesis on dataset?

How to test the following hypothesis? The larger the fare the more likely the customer is to be travailing alone.

Using the data below, how would one be able to test the hypothesis?

import seaborn as sns

# dataset
df= sns.load_dataset('titanic')
df[['fare','alone']].head()

    fare    alone
0   7.2500  False
1   71.2833 False
2   7.9250  True
3   53.1000 False
4   8.0500  True


UPDATE

#subset for alone = True
alone = df['fare'].loc[df['alone'] == True]

#import Wilcoxon test
from scipy.stats import wilcoxon  

#run wilcoxon test
wilcoxon(alone, not_alone)

 WilcoxonResult(statistic=10173.0, pvalue=2.8669052202786427e-28)

Topic hypothesis-testing data-analysis probability python

Category Data Science


An answer to this question that is posted as a comment says to try a logistic regression of "alone/multiple" on the fare. This might be a good first thought, but it suffers from a few issues.

  1. Unless you are careful (more careful than many would be), you allow your analysis to check for the opposite relationship: that larger fairs are associated with less probability of the traveler being alone.

  2. It tests a strictly linear (in log-odds) relationship.

  3. Standard methods to allow GLMs to model nonlinearity, such as splines, need not require the relationship to be monotonically increasing, so your model could show a decreasing trend in a region.

(I might argue that it is worth exploring if there is a region where increasing fare results in a lower probability of traveling alone, but that isn't your question.)

I would argue for a one-sided Wilcoxon test to see if people traveling alone tend to have higher fares, which is logically equivalent to your question.

A Wilcoxon test removes the issues related to a strictly linear relationship (just a linear shift), and it is easy to do the one-sided test. If you would use Spearman correlation to explore a similar question but with a continuous or ordinal variable instead of your alone/multiple variable, then Wilcoxon is a perfect fit, as both Wilcoxon the Spearman correlation are special cases of the proportional odds ordinal logistic regression model.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.