variance

Co-joining multi-peak histograms

Jericho Jones

2022年6月4日 00:10

I am analysing a bunch of data files which represent responsiveness of cells to addition of a drug. If a drug is not added, cell responds normally, if it is added, it shows abnormal patterns: , . We decided to analyse this using an amplitude histogram, in order to distinguish between a change in amplitude and in change of a probability of elliciting the binary response. What we get with file 1 is : So we fit a pdf on …

Topic: variance distribution gaussian multiclass-classification

Category: Data Science

How to find average lag time with variance & confidence of two time series

Ramy Saad

2022年5月30日 16:06

I have two variables as time series, one a consequent of the other, I would like to find the average time delay it takes the dependent variable to act on the independent variable. Additionally, I would like to find the range of variance that is associated with the lag time and its respective confidence level. I am unsure how to go about this in a statistically valid way, but I am using Python. Currently I have used np.diff(np.sign(np.diff(df))) to isolate …

Topic: numpy variance python statistics

Category: Data Science

Whether to use LDA or QDA

Peter

2022年5月18日 09:25

I'm trying to determine whether it's best to use linear or quadratic discriminant analysis for an analysis that I'm working on. It's my understanding that one of the motivations for using QDA over LDA is that it deals better with circumstances in which the variance of the predictors is not constant across the classes being predicted. This is true for my data, however I intend to carry out principal components analysis beforehand. Because this PCA will involve scaling/normalising the variables, …

Topic: variance inference pca classification

Category: Data Science

bias variance decomposition for classification problem

IamTheRealFord

2022年5月13日 10:03

It is given that: MSE = bias$^2$ + variance I can see the mathematical relationship between MSE, bias, and variance. However, how do we understand the mathematical intuition of bias and variance for classification problems (we can't have MSE for classification tasks)? I would like some help with the intuition and in understanding the mathematical basis for bias and variance for classification problems. Any formula or derivation would be helpful.

Topic: bias mathematics variance classification

Category: Data Science

VIF Vs Mutual Info

Rohan

2022年4月29日 21:43

I was searching for the best ways for feature selection in a regression problem & came across a post suggesting mutual info for regression, I tried the same on boston data set. The results were as follows: # feature selection f_selector = SelectKBest(score_func=mutual_info_regression, k='all') # learning relationship from training data f_selector.fit(X_train, y_train) # transform train input data X_train_fs = f_selector.transform(X_train) # transform test input data X_test_fs = f_selector.transform(X_test) The scores were as follows: Features Scores 12 LSTAT 0.651934 5 RM …

Topic: variance mutual-information regression feature-selection

Category: Data Science

Why and how Variational Inference underestimates variance?

Enthusiast

2022年4月29日 08:59

I referred to the Quora link here as well, but could not understand clearly. Can anyone please help me understand why and how variational inference underestimates the variance of the true posterior distribution with some theory or mathematical calculations? [EDIT]: Adding my understanding of the Quora answer based on a visualization. The red line is p(x). The green line is q(x), the approximating distribution. The blue line is the KL divergence. When q(x) is less than p(x), the KL divergence …

Topic: variance bayesian

Category: Data Science

How can I compute the ideal variance threshold value for my data?

Ramzi

2022年4月23日 07:05

I have a dataset that contains n features scaled between [0,1]. I would use an unsupervised feature selection algorithm (variance thresholding). How can I compute the threshold value?

Topic: variance feature-selection

Category: Data Science

Initialisation methods and variance

Bruno Murino

2022年4月21日 21:04

I'm trying to understand some weights initialisation methods by reading the article http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf . But I don't understand their notation on variance. Right in equation (5) they refer to a variable $z^i$ and I don't know what they mean by it: if it's a collective index for all examples or not?

Topic: variance

Category: Data Science

Evaluation of regression models with different evaluations (MSE, variance, VAF etc.)

MerklT

2022年4月21日 02:08

When comparing several regression models in terms of quality, it seems like most have agreed on the MSE. There are also papers comparing "variance" and "variance accounted for (VAF)". However, there seems to be a controversial opinion about the variance (R^2). Should it nevertheless be compared in a scientific paper? $$ VAF_i = \bigg[ 1-\frac{\text{var}\big(y_i - \hat y_i\big)}{\text{var}\big(y_i\big)} \bigg] \times 100\% $$ And what does VAF say? Is the VAF still a good measure of regression models?

Topic: variance regression evaluation machine-learning

Category: Data Science

derivation for expected value for variance

Aj_MLstater

2022年4月18日 20:06

Hi Im taking a course about probability distribution in datascience and below is derivation of the expected value for the variance Variance = expected value of the squared difference from mean for any value. But generally, variance is just the difference between the value and its mean. Why are we squaring and adding the expected value symbol? $$\sigma^2 = E((Y - \mu)^2) = E(Y^2) - \mu^2$$ For the first step in derivation, why do we multiply the summation of $p(x)$ …

Topic: variance distribution probability

Category: Data Science

Do these values of bias and variance make sense?

kasofi9051

2022年3月28日 17:48

I have this code: X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1) model = LinearRegression().fit(X_train, y_train) from mlxtend.evaluate import bias_variance_decomp print(y_train.min(), y_train.max(), y_test.min(), y_test.max()) #for your understanding of the data: 7283 517924 11510 450000 avg_expected_loss, avg_bias, avg_var = bias_variance_decomp( model, X_train, y_train.ravel(), X_test, y_test.ravel(), loss='mse', random_seed=1) print('Average expected loss: %.3f' % avg_expected_loss) print('Average bias: %.3f' % avg_bias) print('Average variance: %.3f' % avg_var) The result is: Average expected loss: 542162695.679 Average bias: 529311955.129 Average variance: 12850740.550 To me, these values …

Topic: bias variance scikit-learn python machine-learning

Category: Data Science

Computing variance of an SGD iteration

user93607

2022年3月16日 04:07

It is known that SGD iteration has huge variance. Given the iteration update: $$ w^{k+1} := w^k - \underbrace{\alpha \ g_i(w^k)}_{p^k}, $$ where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$? I would like to plot it for each iteration and study its behavior during minimization process.

Topic: mathematics variance deep-learning optimization machine-learning

Category: Data Science

Sampling trying to keep as much multivariate variance as possible

PascalVKooten

2022年3月11日 07:06

I was thinking if anyone considered a sampling technique that would try to aim keeping as much of the variance as possible (e.g. as many unique values, or very widely distributed continuous variables). The benefit might be that it will allow development of code around the sample, and really work with the edge cases in the data. You can then later always take a representative sample. So, I am wondering if people have tried to sample for maximum variance before …

Topic: multivariate-distribution variance sampling

Category: Data Science

How to define the features that bring more variance?

Inuraghe

2022年2月25日 04:57

I have a dataset with 10 column, that are my features, and 1732 row that are my registrations. This registration are divided in 15 classes, so I have several registration for every class in my dataset. My goal is to define what is the most important feature, the one that brings more variance between classes. I'm trying to use PCA, but because of the several registration for every classes it's difficult to find the right way of use oof this …

Topic: variance pca

Category: Data Science

How does bias and variance relate with the training/testing error in Machine Learning. In layman terms does high variance means high testing error

honolulu

2022年2月24日 05:10

What causes high BIAS/VARIANCE and what are the consequences. Can some one explain in simple terms w.r.t to training/testing errors . Thanks

Topic: variance linear-regression reinforcement-learning classification machine-learning

Category: Data Science

Why is the variance of my model predictions much smaller than the training data?

eartoolbox

2022年2月19日 15:03

I trained a GRU model on some data and then created a bunch of predictions on a test set. The predictions are really bad, as indicated by a near zero R2 score. I notice that the variance of the model predictions are much smaller than the actual training data. i.e it seems like the model is overfit to the mean: But why is this? I made sure to stop training/use hyperparameters where there was no model overfitting, so why are …

Topic: variance gru keras

Category: Data Science

Math behind, MSE = bias^2 + variance

Fatemeh Asgarinejad

2022年2月14日 23:00

Based on the deeplearningbook: $$MSE = E[(\theta_m^{-} - \theta)^2]$$ $$equals$$ $$Bias(\theta_m^{-})^2 + Var(\theta_m^{-})$$ where m is the number of samples in training set, $\theta$ is the actual parameter in the training set and $\theta_m^{-}$ is the estimated parameter. I can't get to the second equation. Further, I don't understand how the first expression is gained. Note: $Bias(\theta_m^{-})^2 = E(\theta_m^{-2}) - \theta^2$ Also how bias and variance evaluated in classification.?

Topic: mse bias variance estimators

Category: Data Science

Model Selection using Bias Variance Trade Off

Manish

2022年2月11日 12:02

I have a Regression Model with Train MAPE as 6% and Test MAPE as 15%. This appears to me as a clear case of over fitting. But can I still use this model assuming 15% Error is not a bad number after-all. Is this there a flaw in this thinking?

Topic: bias variance model-selection machine-learning

Category: Data Science

Bias and variance in the model o in the predictions?

SRG

2022年2月8日 04:06

This topic confuses me. In the literature or articles, when talking about bias and variance in automatic learning, specifically in cross-validation, do they refer to the high bias (underfitting) and high variance (overfitting) in the model? Or do they refer to the bias and variance of the predictions obtained in the iterations of the cross-validation? How to handle each case?

Topic: bias variance cross-validation machine-learning

Category: Data Science

SKLearn PCA explained_variance_ration cumsum gives array of 1

Kalizi

2022年2月7日 22:55

I have a problem with PCA. I read that PCA needs clean numeric values. I started my analysis with a dataset called trainDf with shape (1460, 79). I did my data cleaning and processing by removing empty values, imputing and dropping columns and I got a dataframe transformedData with shape (1458, 69). Data cleaning steps are: LotFrontage imputing with mean value MasVnrArea imputing with 0s (less than 10 cols) Ordinal encoding for categorical columns Electrical imputing with most frequent value …

Topic: data-analysis variance pca scikit-learn python

Category: Data Science

About