hypothesis-testing

How can I statistically measure/determine if A performs better than B?

newnewnoo11

2022年5月16日 22:51

Hi Data Science Community! I am a new Data Intern and I have been stuck on this question for a while. Here is a sample dataset I am working with: Customer Manufacturer A Spending Manufacturer B Spending Manufacturer A Cost per Product (CPP) Manufacturer B Cost per Product (CPP) Product Cost Difference (B-A) Product Cost Difference in % 1 400000 360000 44 45 1 1/45 2 300000 310000 23 21 -2 -2/21 3 100000 106000 1.4 1.6 0.2 0.2/1.6 I …

Topic: hypothesis-testing data

Category: Data Science

How do i conduct t-test for comparing the accuracy of two binary classifiers?

honolulu

2022年5月14日 13:02

I am using two binary classifiers that predicts the accuracy of samples over a dataset. Accuracy is defined as ratio of correct vs incorrect predictions. Do i need to take accuracies sampled over multiple experiments and use them as data for t-test. Can some explain please ? Also what will the result of the t-test convey?. Thanks in advance.

Topic: hypothesis-testing descriptive-statistics data statistics

Category: Data Science

Are experiments using confidence interval can be said a statistical test

honolulu

2022年5月13日 07:18

I am working on some algorithm that is comparing results with other model using confidence interval , 90%. Can this be said a statistical test ? I read a article where it said about statistical test with some confidence level. Is confidence level same as confidence interval in statistical tests ?

Topic: confidence hypothesis-testing descriptive-statistics data statistics

Category: Data Science

What kind of statistical test can be performed in a recommender system dataset that predicts the ratings for the movies?

honolulu

2022年5月10日 15:22

The dataset consists of 1000s of users and users and each row of the dataset consist of user_id,movie_id and ratings the user provides to the movie. eg. 1,56,5 In my experiment i am calculating the mse and precision using collabarative filtering model. The error comes from difference in predicted and actual ratings. I want to conduct a statistical test now. Which statistical model is to performed and how? Thanks in advance.

Topic: hypothesis-testing descriptive-statistics data recommender-system statistics

Category: Data Science

Check if distribution per week is the same

Ismail

2022年5月1日 16:07

I have sales by customer (b2b) and by date. I want to check if the distribution per day inside weeks remains the same from week to week. Initial dataset Customer Date Sales Alpha 2019-02-23 527 Beta 2019-02-23 642 Alpha 2019-02-24 776 ... ... ... Beta 2021-07-28 1236 I transformed it into Customer Week Monday Tuesday Wednesday Thursday Friday Saturday Sunday Alpha 201906 0.2202 0.15799 0.178202 0.160449 0.1528 0.130214 0.000067 Beta 201906 0.20573 0.183979 0.182207 0.179824 0.140596 0.107601 0.000061 ... ... …

Topic: hypothesis-testing distribution statistics

Category: Data Science

Hypothesis test for classification model

william007

2022年4月25日 09:33

I have a model that outputs 0 or 1 for interest/not-interest in a job. I'm doing an A/B/C test comparing two models (treatment groups) and none (control group). ANOVA for hypothesis testing and t-test with Bonferroni correction for posthoc testing is my plan. But both tests assume normality. Can we have normality for 0 and 1? If so, how? If not, what's the best test (including posthoc)?

Topic: anova hypothesis-testing ab-test classification

Category: Data Science

Does T-test requires Standard deviation or variance for calculation

Chris

2022年4月20日 11:01

Might be a novice question, but the main difference between a t-test and z-test, I was able to understand, is that the z-test calculation requires the SD value of the sample where as in a t-test, we do not have SD, apart from high and low sample size. But when calculating the t-test value, the formula requires the SD value as well. So what is the difference between a t and z test? Can someone please clear this up?

Topic: hypothesis-testing mathematics pvalue statistics

Category: Data Science

How to test likelihood hypothesis on dataset?

IOIOIOIOIOIOI

2022年4月18日 13:38

How to test the following hypothesis? The larger the fare the more likely the customer is to be travailing alone. Using the data below, how would one be able to test the hypothesis? import seaborn as sns # dataset df= sns.load_dataset('titanic') df[['fare','alone']].head() fare alone 0 7.2500 False 1 71.2833 False 2 7.9250 True 3 53.1000 False 4 8.0500 True UPDATE #subset for alone = True alone = df['fare'].loc[df['alone'] == True] #import Wilcoxon test from scipy.stats import wilcoxon #run wilcoxon test …

Topic: hypothesis-testing data-analysis probability python

Category: Data Science

What kind of hypothesis testing in Python can be used to validate that 4 job titles are significantly different based on their skillset?

Justin Schmidt

2022年4月18日 12:41

I have 4 job titles, for each of which I scraped hundreds of job descriptions and classified them by if they contain words related to a predefined list of skills. For each job description, I now have a True/False parameter if they mention one of the skills. How can I validate that there is a significant difference between job descriptions that represent different job titles? I'm very new to this topic and all I could think of is using dummy …

Topic: web-scraping hypothesis-testing scipy python categorical-data

Category: Data Science

How to test if a curve is well described by an ellipse?

John H

2022年4月11日 22:35

I have a set of data points in 2D, and I am trying to come up with some sort of statistical to determine if the points fall along an ellipse. My idea so far is to fit an ellipse to the points, take the main square error, and use this as an indicator. However, this requires me to set some threshold for what is a good MSE (so MSE above this threshold indicate that the points do not fall along …

Topic: data-science-model hypothesis-testing bayesian overfitting statistics

Category: Data Science

Why rejection of a true null hypothesis is called type I error?

belz

2022年3月31日 18:01

I’m comparing two confusion matrices: https://en.wikipedia.org/wiki/Confusion_matrix#Table_of_confusion https://en.wikipedia.org/wiki/Type_I_and_type_II_errors The 2nd is rotated, the Decision is on Y-axis. But I assume both reflect the same concept. I have two options to render the word “Reject”. (1) When we look at Null hypothesis matrix, the Reject of a “True Null hypothesis” means a decision which doesn’t reflect reality (convicting an innocent), and this is indeed FP (type I). (2) Following Confusion_matrix wiki, I interpret Reject as False. Therefore, making a False decision (H0 …

Topic: hypothesis-testing confusion-matrix

Category: Data Science

the correct Cohen D formula for paired ttest

ak1431

2022年3月23日 15:53

I have read some articles and I am not sure which is the best way to calculate the effect size for paired ttests. According to this Geff Cumming paper, using the pretest SD is the best way to go about it. d = mean(paired_differences)/pretest std Another formula I have stumbled upon is d = mean(paired_differences)/std(paired_differences) Can someone help me with understanding how they are different? For context the data I have is paired samples for pre and post intervention

Topic: hypothesis-testing statistics

Category: Data Science

How do you use KS-test in a data science report?

Carl

2022年3月20日 20:07

I'm writing a data science report, I want to find an exist distribution to fit the sample. I got a good looking result , but when I use KS-test to test the model, I got a low p-value,1.2e-4, definitely I should reject the model. I mean, whatever what distribution/model you use to fit the sample, you cannot expect to have a perfect result, especially working with huge amount of data. So what does KS-test do in a data science report? …

Topic: non-parametric data-science-model hypothesis-testing statistics

Category: Data Science

Why we need a statistical hypothesis testing for correlations between variables when we can check using scatter plots?

Khilesh Chauhan

2022年3月9日 13:13

Need Guidance on correlation test. When we need to perform correlation between 2 variables, we generally start with scatter plot. Sometime it is suggested to perform hypothesis testing as well. I'm Looking for some documentation guide which can help in deciding whether to check for scatter plot OR hypothesis testing.

Topic: hypothesis-testing data-analysis correlation visualization

Category: Data Science

How do I know that model performance improvement is significant?

emilaz

2022年3月3日 00:08

Say I am running a Machine Learning model that produces a certain result (say accuracy of 80%). I now change a minor detail in my model (say, in a Deep Learning model, increase the kernel size in one convolutional layer) and run the model again, leading to an accuracy of .8+x. My question is how I would determine which in-/decrease in performance allows me to say that the new network architecture is better than my old one? I assume that …

Topic: hypothesis-testing statistics machine-learning

Category: Data Science

When do I need Statistical Signifcance testing and when not?

kurmanjo

2022年2月3日 11:48

Hi there I have a handful of questions regarding statistical significance testing. As a newcomer I have sometimes topics that I do not really understand entirely. One of them is checking for statistical significance. For example, when I do A/B Testing I understand that I have to check whether my results are statistically significant (p value test) before looking for effect sizes. 1. Question: One question is if I only do Statistical Significance Tests in the context of Hypothesis Testing? …

Topic: hypothesis-testing distribution descriptive-statistics

Category: Data Science

Evaluating if metric of one group is higher than the metric of another group when group sizes differ significantly

Erik M

2022年1月28日 12:10

I am working with a dataset that contains data of applicant income, gender, and loan status (whether or not the person was approved for a loan). I've created the following plots from the data. The histogram plot is: The kernel density estimate (KDE) plot is: The KDE plots seem to indicate that the accepted to rejected ratio among men is higher for a given income than compared to women. I want to investigate this further. Note (!) there are more …

Topic: hypothesis-testing data-analysis variance

Category: Data Science

Why do I get this result with a chi- square test?

Polaster

2022年1月26日 04:50

I have a question about the chi squared independence test, I'm working on dataset and I'm interested in finding the link between the categories of product and the gender, I plot my contingency table. contingency_table :- I found that p-value is1.54*10-5 implying that my variables are correlated. I don't really understand how is it possible because the proportion between man and women for each category are very similar.

Topic: chi-square-test interpretation hypothesis-testing

Category: Data Science

T-test against normalised or standardised data gives different results

user305883

2022年1月15日 17:03

I am studying the problem to predict popularity of a tweet, and want to test null hypothesis: there is no relationships between favorite_counts and another set of variables, like number of friends of users. I am not sure if normalise or standardise the variables, because I am thinking how to model popularity and don't know how the distributions of likes and friends among users are (please advise). So I tried the two, and tried an independent t_test. I get very …

Topic: hypothesis-testing pvalue normalization twitter statistics

Category: Data Science

statistical tests for null hypothesis - what if model is non linear?

user305883

2021年12月10日 14:16

I am reading the "An Introduction to Statistical Learning" (Gareth James & alii, Springer) as a primer to machine learning. I am reading the part in linear regressors, and learnt there are different tests for measuring correlations and significance of correlations between predictors (also named variables)- under the assumption that the model may be linear. What about if the relationship between variables is (or assumed to be) non-linear ? I also read that anyway many linear models concepts underpins a …

Topic: data-science-model hypothesis-testing linear-regression correlation

Category: Data Science

About