Loss function to prevent estimator bias

I have a regression problem I'm trying to build a model for: Predicting sales per person (>= 0) depending on some variables. I'm running different model types and gave deep neural networks a try. The loss functions I'm using are mean squared error and mean absolute error (or sometimes a mix). I often run into this issue though, that despite mse and mae are being optimized, I end up with a very strong bias in the prediction, e.g. sum(training_all_predictions) / …
Category: Data Science

Difference between ethics and bias in Machine Learning

I'm confused about the difference between "ethics" and "bias" when those concepts are discussed in the context of Machine Learning (ML). In my understanding, ethical issue in ML is pretty much exactly the same thing as "bias": say, the model discriminates people of color and this is the same as to say that the model is biased. In short, "ethics is always a bias, but it is not necessarily true that a bias is always an ethical issue". Is this …
Category: Data Science

Learning high bias in neural net

I have this simple model, which tries to predict constant $[1, 1, .. 1, 0, ..., 0]$ vector regardless of input. I found that model predicts it successfully if trained on input in $[0,10]$ range, however model's predictions are always $[0...0]$ vectors if model is trained on input in $[750, 770]$ range. I was thinking model should converge to high bias weights and still be able to predict constant vector even for larger training inputs. Maybe anyone can advice what …
Category: Data Science

bias variance decomposition for classification problem

It is given that: MSE = bias$^2$ + variance I can see the mathematical relationship between MSE, bias, and variance. However, how do we understand the mathematical intuition of bias and variance for classification problems (we can't have MSE for classification tasks)? I would like some help with the intuition and in understanding the mathematical basis for bias and variance for classification problems. Any formula or derivation would be helpful.
Category: Data Science

How to provide Intentional Bias towards recent examples in Text Classification?

I have trained an XGBClassifier to classify text issues to a rightful assignee (simple 50-way classification). The source from where I am fetching the data also provides a datetime object which gives us the timestamp at which the issue was created. Logically, the person who has recently worked on an issue (say 2 weeks ago) should be a better suggestion instead of (another) person who has worked on similar issue 2 years ago. That is, if there two examples from …
Category: Data Science

how to test if the target variables is correlated with protected variables?

I wonder how to check if the protected variables in fairness either encoded in the other features (non-protected). Or if they are not sufficiently correlated with target variables so adding them does not improve performance in predication(classification)?. If there is a Python tutorial showing that , it will be useful. Regards,
Topic: bias
Category: Data Science

Derivative of Loss wrt bias term

I read this and have an ambiguity. I try to understand well how to calculate the derivative of Loss w.r.t to bias. In this question, we have this definition: np.sum(dz2,axis=0,keepdims=True) Then in Casper's comment, he said that the The derivative of L (loss) w.r.t. b is the sum of the rows $$ \frac{\partial L}{\partial Z} \times \mathbf{1} = \begin{bmatrix} . &. &. \\ . &. &. \end{bmatrix} \begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix} $$ But actually, using axis=0, is it not …
Category: Data Science

Do these values of bias and variance make sense?

I have this code: X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1) model = LinearRegression().fit(X_train, y_train) from mlxtend.evaluate import bias_variance_decomp print(y_train.min(), y_train.max(), y_test.min(), y_test.max()) #for your understanding of the data: 7283 517924 11510 450000 avg_expected_loss, avg_bias, avg_var = bias_variance_decomp( model, X_train, y_train.ravel(), X_test, y_test.ravel(), loss='mse', random_seed=1) print('Average expected loss: %.3f' % avg_expected_loss) print('Average bias: %.3f' % avg_bias) print('Average variance: %.3f' % avg_var) The result is: Average expected loss: 542162695.679 Average bias: 529311955.129 Average variance: 12850740.550 To me, these values …
Category: Data Science

Handling bias inputs during normalization

Suppose I have an input matrix $\mathbf X\in \mathbb R^{(D+1)\times N}$ where $N$ is number of samples $D$ is dimension of an input vector $x$ and extra $1$ dimension is for bias where all bias entries are $1$. If I want to normalize all inputs by subtracting mean and dividing by standard deviation how should I handle bias entries? Should they stay same as $1$
Category: Data Science

Visualizing the equation for separating hyperplane

I was wondering if I can visualize with the example the fact that for all points $x$ on the separating hyperplane, the following equation holds true: $$w^T.x+w_0=0\quad\quad\quad \text{... equation (1)}$$ Here, $w$ is a weight vector and $w_0$ is a bias term (perpendicular distance of the separating hyperplane from the origin) defining separating hyperplane. I was trying to visualize in 2D space. In 2D, the separating hyperplane is nothing but the decision boundary. So, I took following example: $w=[1\quad 2], …
Category: Data Science

What is the defining Set in NLP

I am reading the paper Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings here is the pdf. On page 6, we read: Step 1: Identify gender subspace. Inputs: word sets W , defining sets D_1 , ..., D_m. However, they paper before and after this statement does not mention what these defining sets are? Can anyone give me a definition or description of these sets? Thank you.
Category: Data Science

How to manage sampling bias between training data and real-world data?

I'm currently working on a binary classification problem. My training dataset is rather small with only 1000 elements. (I don't know if it is relevant : my problem is similar to the "spam filtering" problem where a data can also be "likely" to be categorized as spam but i simplified it as a black or white issue, and use the probability given by the models to assign a likelihood score) Among those 1000 elements: 70% are from the class 1 …
Category: Data Science

Math behind, MSE = bias^2 + variance

Based on the deeplearningbook: $$MSE = E[(\theta_m^{-} - \theta)^2]$$ $$equals$$ $$Bias(\theta_m^{-})^2 + Var(\theta_m^{-})$$ where m is the number of samples in training set, $\theta$ is the actual parameter in the training set and $\theta_m^{-}$ is the estimated parameter. I can't get to the second equation. Further, I don't understand how the first expression is gained. Note: $Bias(\theta_m^{-})^2 = E(\theta_m^{-2}) - \theta^2$ Also how bias and variance evaluated in classification.?
Category: Data Science

Amount of data needed for deep learning vs support vector machine

I often read about the fact, that the amount of data to train and get a generalizing model for a deep learning algorithm is much higher in comparison, e.g. to a support vector machine. It makes sense, because of the huge amount of parameters in a deep learning approach, which potentially leads to overfitting. However: Are there any systematic studies on this? Do deep learning approaches really need more data? Best regards, Gesetzt
Category: Data Science

Bias and variance in the model o in the predictions?

This topic confuses me. In the literature or articles, when talking about bias and variance in automatic learning, specifically in cross-validation, do they refer to the high bias (underfitting) and high variance (overfitting) in the model? Or do they refer to the bias and variance of the predictions obtained in the iterations of the cross-validation? How to handle each case?
Category: Data Science

Look ahead bias predicting a time series using features

I am making some ML methods (RF, RNN, MLP) to predict a time series value 'y' based on features 'X' and not the time series 'y' itself. My question is regarding the bias I might be including since I am doing a simple random train-test-split for the fit and evaluation process, so I am using data from different days (past and future) and not spliting by time. Is it valid for this prediction process, or even that I am not …
Category: Data Science

Backpropagation of Bias in Neural networks

My goal is to calculate backpropagation(Especially the backpropagation of the bias). For example, X, W and B are python numpy array, such as [[0,0],[0,1]] , [[5,5,5],[10,10,10]] and [1,2,3] for each. And, suppose dL/dY is [[1,2,3],[4,5,6]]. How to calculate dL/dB? Answer should be [5, 7, 9]. why is it calculated that way?
Category: Data Science

learning curves of a classification algorithm

I a trying to understand this learning curve of a classification problem. But I am not sure what to infer. I believe that I have overfitting but I cannot sure. Very low training loss that’s very slightly increasing upon adding training examples. "Gradually decreasing validation loss (without flattening) upon adding training examples". However, I do not see any gap at the end of the lines something that is usually can be found in an overfitting model On the other hand, …
Category: Data Science

Keras model prediction always has unwanted offset

I am trying to predict next 10 days by looking into the last 60 days. So tried to implement an LSTM layer. Before jumping into the question, I want to clarify a few points. Firstly, this is a Multiple Parallel Input and Multi-Step Output problem as it is described in the link. I collected the data of the last 5 years of all funds available in my country from this address. I refined my data as much as possible. Of …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.