What is the impact of size data on the confidence (p-value) of model coefficients?. Does increasing the size of data always improve the confidence in the model coefficients? Suppose I have 100 data points. I created another data from the same data by duplicating the original data 100 times. i.e. I have 100,000 data points now. If I run the model on two data sets, what would impact the model coefficients and why? I appreciate any help you can provide.
I need to calculate the P-value for a stacked LSTM model. Does that mean that I need to run the model multiple times and consider the null hypothesis that the model's accuracy is almost the same across the multiple runs? if not can you please help me with how to do it correctly?
I am trying to figure out whether our customer support has an impact on tickets opened by customers. Our employees should contact customers to avoid that a user will open a ticket. The data is quite accurate. I am plotting the (pro-active) contacts per day, the opened tickets per day and I am using a linear fit for both. Both r² values are around 15% and the p-values are pretty bad as well (way above 5%). I wonder if I …
Might be a novice question, but the main difference between a t-test and z-test, I was able to understand, is that the z-test calculation requires the SD value of the sample where as in a t-test, we do not have SD, apart from high and low sample size. But when calculating the t-test value, the formula requires the SD value as well. So what is the difference between a t and z test? Can someone please clear this up?
I have a dataset that measures the a student time spent working on a mathematics question. My dataframe looks a little something like this: Participant ID Question 1 Question 2 Question 3 1107 54.2 48.9 45.0 4208 53.1 45.6 40.6 I have times for 20 questions for about 200 participants. Now I have observed an overall decrease in time spent per question, as is shown in the figure below: I would like to accompany graph this with a statistical measure …
I need to calculate statistical significance of difference between two time series, each with 4500 terms. My null hypothesis is that $H_0: \mu=\mu_0$. How can I calculate p-value? Is Z-statistic useful for p value calculation? How to get p-value after computing Z-statistic? I have $\alpha = 0.05$.
I need to do a chi square test of two of my dataset's categorical variables. This two variables have basically the same meaning but comes from two different sources, so my idea is to use a chi square test to see how "similar" or correlated, these two variables really are. To do so, I've written code in Python, but the p-value I get from it is exactly 0 which sounds a little strange to me. the code is: from scipy.stats …
Is it correct to say that the lower the p-value is the higher is the difference between the two means of the two groups in the t-test? For example, if I apply the t-test between two groups of measurements A and B and then to two groups of measurements B and C and I find that in the first case the p-value is lower than the second case, could one of the possible interpretations be that the difference between the …
I am studying the problem to predict popularity of a tweet, and want to test null hypothesis: there is no relationships between favorite_counts and another set of variables, like number of friends of users. I am not sure if normalise or standardise the variables, because I am thinking how to model popularity and don't know how the distributions of likes and friends among users are (please advise). So I tried the two, and tried an independent t_test. I get very …
I ran a chi squared test on multiple features & also used these features to build a binary classifier using logistic regression. The feature which had the least p value (~0.1) had a low coefficient (=0) whereas the feature which had a higher p value (~0.3) had a high coefficient (~2.9). How do I interpret this? Is it possible for a feature to have low p value but have zero coefficient?
I want to write a method to test multiple hypotheses for a pair of schools (say TAMU and UT Austin). I want to consider all possible pairs of words (Research Thesis Proposal AI Analytics), and test the hypothesis that the words counts differ significantly across the two schools, using the specified alpha (0.05) threshold. Only need to conduct tests on words that have non-zero values for both schools. I.e., every row and column in the contingency table should sum to …
I would like to run one-way ANOVA test on my data. I saw that one of several assumptions for one-way ANOVA is that there needs to be homogeneity of variances. I have run the test for different data-sets. I find sometimes my p-values are larger than 0.05 and for some datasets it is smaller. As I understand, if the p-value is smaller than 0.05, then I can reject the null hypothesis and tell that the variances are not equal (and …