descriptive-statistics

How do i conduct t-test for comparing the accuracy of two binary classifiers?

honolulu

2022年5月14日 13:02

I am using two binary classifiers that predicts the accuracy of samples over a dataset. Accuracy is defined as ratio of correct vs incorrect predictions. Do i need to take accuracies sampled over multiple experiments and use them as data for t-test. Can some explain please ? Also what will the result of the t-test convey?. Thanks in advance.

Topic: hypothesis-testing descriptive-statistics data statistics

Category: Data Science

How to count words in a dataframe?

Jasmine

2022年5月13日 12:55

I would like to count how many Male and female who answer (ex. Biking / Cycling). Below is the sample data:

Topic: data-science-model jupyter descriptive-statistics python machine-learning

Category: Data Science

Are experiments using confidence interval can be said a statistical test

honolulu

2022年5月13日 07:18

I am working on some algorithm that is comparing results with other model using confidence interval , 90%. Can this be said a statistical test ? I read a article where it said about statistical test with some confidence level. Is confidence level same as confidence interval in statistical tests ?

Topic: confidence hypothesis-testing descriptive-statistics data statistics

Category: Data Science

What kind of statistical test can be performed in a recommender system dataset that predicts the ratings for the movies?

honolulu

2022年5月10日 15:22

The dataset consists of 1000s of users and users and each row of the dataset consist of user_id,movie_id and ratings the user provides to the movie. eg. 1,56,5 In my experiment i am calculating the mse and precision using collabarative filtering model. The error comes from difference in predicted and actual ratings. I want to conduct a statistical test now. Which statistical model is to performed and how? Thanks in advance.

Topic: hypothesis-testing descriptive-statistics data recommender-system statistics

Category: Data Science

A/B Testing (Binomial Distribution vs Random Distribution)

DD.

2022年4月28日 17:04

When performing an A/B test for the number of clicks for users viewing (each view is an impression) two variants of an ad, a binomial distribution can be assumed where each variant has a constant click-through rate. Example: Two Ads, -> Ad one has 1000 impressions and 20 clicks, CTR is 2%; -> Ad two has 900 impressions and 30 clicks, CTR is 3.3%. Test whether there is a difference between Click Through Rate (CTR) between Ads one and two. …

Topic: distribution descriptive-statistics ab-test statistics

Category: Data Science

PSI where not to use

tndr

2022年4月14日 13:01

From what I understand PSI is used for continuous data. Generally, equal sized bins are created to compare two data set, and number of buckets is usually 10. Is that for a reason, why 10 bucket? Also, I was wondering if PSI can also be use categorical data less than 10 value? In case of categorical variables, what approach would be the best to estimate the shift in the population?

Topic: descriptive-statistics

Category: Data Science

Drastic drop in Somers' D ? Why?

CoolJohnTo

2022年4月6日 20:03

I came across to find the correlation between the ratings assigned by two coaches to a same group of 40 players. I have tabulated the results as below: The Somers' D is 50%. However, for the case below, The Somers' D is 94.7%. My question is, why both scenarios are having 2 deviations but the first scenario has so much lower Somers' D compared to the second scenario?

Topic: interpretation descriptive-statistics ranking cross-validation

Category: Data Science

Analysis for basic weight training analysis?

stevemn

2022年4月4日 08:02

TL;DR: I'm doing a fairly basic project which involves exercise. It seems that descriptive statistics and basic data vis (ex: line graph) would be most appropriate for this project, but I wonder if you have any recommendations for analyses. For this project, I am performing the same set of 15 single-joint exercises each week (we'll call these "Exercises"). Every 4 weeks, I'm performing 3 different multi-joint exercises (we'll call these "Lifts"). My goals are to: Track my progress (strength gains) …

Topic: data-analysis descriptive-statistics self-study

Category: Data Science

Insights betwwen two columns/variables in Dataframe

marco

2022年4月3日 20:07

I have data in two columns one is range of old credit score (Input score range) and new credit score (cvsc100). How do i find insights from both of them ? where the old is range of values and other column is not(CVSC100) I know how to calculate Pearson Correlation in Python of Dataframe of two column . but that should not be sufficient i believe. How should i proceed can you please advise

Topic: data-analysis descriptive-statistics python data-mining machine-learning

Category: Data Science

A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?

Alwin Aind

2022年4月2日 22:00

There is no other description about the data, if it is univariate, bivariate, etc. neither the type of distribution is given. I recently came across this question, I would like to know how skewness affects unaffected data percentage

Topic: descriptive-statistics statistics data-mining

Category: Data Science

Making my dataset stationary increases SampleEntropy score

canP

2022年3月8日 13:22

When i make my dataset stationary by the taking difference method, my SampleEntropy score is increasing. It means my data is being "less forecastable". But my results are deffinetly better after making my dataset stationary. Should i ignore result of SampleEntropy?

Topic: forecasting descriptive-statistics time-series statistics machine-learning

Category: Data Science

When do I need Statistical Signifcance testing and when not?

kurmanjo

2022年2月3日 11:48

Hi there I have a handful of questions regarding statistical significance testing. As a newcomer I have sometimes topics that I do not really understand entirely. One of them is checking for statistical significance. For example, when I do A/B Testing I understand that I have to check whether my results are statistically significant (p value test) before looking for effect sizes. 1. Question: One question is if I only do Statistical Significance Tests in the context of Hypothesis Testing? …

Topic: hypothesis-testing distribution descriptive-statistics

Category: Data Science

p-value and effect size

HelpNeederStudent

2022年1月23日 06:02

Is it correct to say that the lower the p-value is the higher is the difference between the two means of the two groups in the t-test? For example, if I apply the t-test between two groups of measurements A and B and then to two groups of measurements B and C and I find that in the first case the p-value is lower than the second case, could one of the possible interpretations be that the difference between the …

Topic: pvalue inference descriptive-statistics statistics

Category: Data Science

How do you determine if a value is statistically significant?

batu

2022年1月12日 19:34

I have collected some data I need to analyze. The data is the result of a survey in which I asked approx. 180 sellers at a bazaar, how important a certain buyer's characteristic is in relation to their price setting on a scale from '1 = absolutely unimportant' to '10 = extremely important' (for instance, how important is a buyer's nationality in relation to the price a merchant is offering his goods). I now have analyzed my results and clustered …

Topic: descriptive-statistics r

Category: Data Science

How do I handle string feature while performing model generation

nlper

2022年1月8日 12:28

I have data which looks like this shift_id user_id status organization_id location_id department_id open_positions city zip role_id specialty_id latitude longitude years_of_experience 2 9 S 1 1 19 1 brooklyn 48001 2 9 42.643 -82.583 6 60 S 12 19 20 1 test 68410 3 7 40.608 -95.856 9 61 S 12 19 20 1 new york 48001 1 7 42.643 -82.583 10 60 S 12 19 20 1 test 68410 3 7 40.608 -95.856 21 3 S 1 1 19 …

Topic: descriptive-statistics scikit-learn pandas python

Category: Data Science

Calculate rate from related datasets

museshad

2021年12月20日 17:30

I have the monthly sales rate for various products. The products are sold in different countries. I'm looking for a meaningful way to calculate the sales rate at each country. The sales rate indicated below is across all countries. Product Global Sales Rate Pen 9 Pencil 4 Product Country Sold Pen India Pen Australia Pencil Italy Pencil Japan When there is a new product launch, business team creates an opportunity including products similar to the one being launched. I know …

Topic: descriptive-statistics statistics

Category: Data Science

evaluation metrics for multiple values per session

sbr

2021年12月8日 11:04

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that i can implement as "foo" function and my goal is to evaluate them based on execution delay . The number of times foo() is called per user session is variable but will not exceed 10000. Say delays values are: Algo1: [ [12, 30, 20, 40, 24, 280] , [13, 14, 15, 100], [20, 40] ] Algo2: [ [1, 10, …

Topic: distribution descriptive-statistics evaluation accuracy statistics

Category: Data Science

Comparing data sets with different measurements

O B

2021年12月6日 10:02

I'm currently writing a thesis based on Cyber Crime, however I'm unsure of the proper to compare/analyse my data sets to talk about them in my thesis. One piece of data (https://www.pandasecurity.com/mediacenter/src/uploads/2014/07/Pandalabs-2015-anual-EN.pdf on page 9) it states that the 'infection rates' of Sweden is 20.88% (bottom 3 ranking), USA at 29.48% (middle ranking) and China (first rank) having 57.24%. Another (http://www.virusradar.com/en/home/world) , uses a different measurement to define the 'threat rates', which is different to the one above, which has …

Topic: descriptive-statistics visualization dataset statistics data-cleaning

Category: Data Science

Time Series Analysis for Categorical Data Output

Srini

2021年11月22日 17:00

Suppose I am having dataset which consist of date as one column and fruits as second column which is categorical data having set of 4 different fruits in that column and my output column has 0's and 1's whether the particular fruit sold at that time or not.Based on this, I can able to predict for pattern like,what will be the status of particular fruit selling after some years? How to do time series analysis for those categorical data? Any …

Topic: forecasting descriptive-statistics time-series python

Category: Data Science

what is a collective term for things as screening failures, no response and drop-outs

freshman

2021年10月7日 22:42

Hello wondering what is a collective term (in terms of statistics) for things as screening failures, no response and drop-outs?

Topic: descriptive-statistics

Category: Data Science

About