I am working on a data science project on an industrial machine. This machine has two heating infrastructures. (fuel and electricity). It uses these two heatings at the same time, and I am trying to estimate the temperature value that occurs in the thermocouple as a result of this heating. However, this heating process takes place with some delay/lag. In other words, the one-unit change I have made in fuel and electrical heating is reflected in the thermocouple hours later. …
What is the easiest and easy to explain feature importance calculation for linear regression? I know I can use Shap to compute feature importance, but I find it difficult to explain it to stakeholders, and the coefficient is not a good measure of feature importance since it depends on the scale of the feature. Some suggested (standard deviation of feature)*feature coefficient as a good measure of feature importance.
(How) can absolute or relative contributions be calculated for a multiplicative (log-log) model? Relative contributions from a linear (additive) model E.g., there are 3 contributors to $y$ (given by the three additive terms): $$y = \beta_1 x_{1} + \beta_2 x_{2} + \alpha$$ In this case, I would interpret the absolute contribution of $x_1$ to $y$ to be $\beta_1 x_{1}$, and the relative contribution of $x_1$ to $y$ to be: $$\frac{\beta_1 x_{1}}{y}$$ (assuming everything is positive) Relative contributions from a log-log …
What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.
I have a 2 variables that resemble 2 different tests. I would like to multiple it by 0.2 with conditions. If test1 is available and test2 is not (dataset would show NA), use test1. Same condition for test2, if test2 is available and test1 is not, use test2 values. If both test1 and test2 are available, then use the minimum of both value. Below is my formula, is there a more accurate one? $$ total score_i = 0.2*test_i\begin{cases}test1&test1>0\\\\test2&test2>0\\\\min(test1,test2)&test1\ and\ test2 …
Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it possible why precisely these features are being eliminated from the model? (Is it the presence of any other features/multicollinearity etc.? I want to explain the lasso model behaviour. Your help is highly appreciated.
I have a model to predict energy consumption in a food processing plant. Different food products are produced in the plant. My model is given as Energy consumption(Kwhr) = alpha0 +alpha1(Food Item A produced in Kg) + alpha2(Food Item A produced in Kg) +alpha3(Food Item C produced in Kg)+....+Other variables Since different product categories have different energy intensities, I would like to add that detail to the model. Can I derive the feature importance of the Total production on the …
I want to create a model in a food processing plant where my dependent variable is Electricity (KWhr) consumption per kg. Plant produce different food items with varying electricity consumption. I'm interested in knowing the impact of the proportion of each food item on consumption per kg. so my model is consumption per kg produced (Kwhr/kg) = alpha0 +alpha1(Food Item A/Total Production) + alpha2(Food Item B/Total Production)+....+Other variables Is it correct to frame the question like this?. I have Total …
This is the first time I have posted here. I am looking for some feedback or perspective on this question. To make it simple, let's just talk about linear models. We know the MLE solution for the $l_1$ loss objective is the same as the Bayesian MAP estimate with a Laplace prior for each parameter. I'll show it here for convenience. For vector $Y$ with $n$ observations, matrix $X$, parameters $\beta$, and noise $\epsilon$ $$Y = X\beta + \epsilon,$$ the …
I am looking for ideas to not only solve the least square problem, but to enforce errors to be roughly similar. One idea I had is to add the variance of errors in the classical Ordinary Least Square problem. My criterion with respect to matrix A, x and y being vectors, would be as follow: $$ J(A) = \mu_e + \lambda\sigma_e $$ where $$ \mu_e = ||Ax-y||²=\sum{e_i}=\sum||Ax_i - y_i||² $$ and $$ \sigma_e = \sum (e_i - \mu_e)² $$ A …
I have a dataset with with multiple classes (< 20) which I want to classify in reference to one of the classes.The final goal is to extract the variables of importance which are useful to distinguish each of the classes vs reference. If it helps to frame the question, an example would be to classify different cancer types vs a single healthy tissue and determine which features are important for the classification of each tumour. My first naive approach would …
Is there any tool to visualize the feasible region when given a set of Linear equations (equalities and inequalities). If not, can anyone suggest a way to visualize it? If I am going to do it myself using Python, which libraries should I use. I have found sympy, but I couldn't get it to draw inequalities nor draw the intersections only. I have also found wolfram, but I could only see pre-built visualizations and not visualize my own system. Can …
Does the appliance of R-squared to non-linear models depends on how we calculate it? $R^2 = \frac{SS_{exp}}{SS_{tot}}$ is going to be an inadequate measure for non-linear models since an increase of $SS_{exp}$ doesn't necessarily mean that the variance is decreasing, but if we calculate it as $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$, then it's as much meaningful for non-linear models as it is for linear ones. I asked a similar question here where I showed that R-squared is no worse for …
For example we have $X$ train data, $y$ and $w$ Our margin is $M = y_i \langle w, x_i \rangle$ If $M_i > 0$ classifier return True predict and otherwise, if $M_i < 0$ we get False predict. How does it work? $y_i = \langle w, x_i \rangle$ , it means they have same sign, and if we multiply them, the multiply product will be always positive, because plus * plus = plus and minus * minus = plus. Otherwise …
I have deleloped several SVR models for my case study using the linear kernel, and those models were optimized using the RMSE as criterion. Now Im searching for additional evaluation metrics and it turns the most publications use R squared to compare model performance during training and validation phases. It's generally suggested to avoid to use R-squared to assess the model if it uses non-linear kernel such as polynominal or radial basis function. And this refers to the fact that …
This question is a continuation of a similar question for linear models instead of Tree-based model. Given that linear models (e.g. lasso, ridge, Linear regression, elastic net, etc.) can't handle missing NaN values and are sensitive to feature scale, what are appropriate approaches to encode or impute missing not at random values in independent features. For example, if I have the following two independent features in my model: CAR_OWNER: Binary features (TRUE/FALSE or 0/1) w/o missing values CAR_COLOR: BLUE, GREEN, …
I'm working on an unassessed homework problem from unpublished course notes of a statistics module from a second year university mathematics course. I'm trying to plot a 2-parameter full linear model and a 1-parameter reduced linear model for the same data set. I can't figure out how to plot the 1-parameter model; all attempts so far have either given errors or a non-constant slope. xs <- c(0,1,2) ys <- c(0,1,1) Data <- data.frame(x = xs, y = ys) mf <- …
In an interview, I was asked the intuition behind the weight vector. I told the weight vector is a vector which we try to minimize to a local minima with the help of regulariser so we don't overfit. Weights tells us the influence of a feature on the model. Although I am not sure if my intuition is correct. Is Weight vector W always normal to the plane? Say we have 5 features and after training say logistic reg we …
I have a small dataset (1500 rows) and to predict the imbalanced target, I am running two linear models (linear regression and lasso) and one nonlinear model (Neural Network) on it. I am using Area Under Precision Recall Curve (AUPRC) to compare the three models. The baseline in the curve is 10%, AUC for linear regression is 11%, AUC for lasso is 11.2%, and AUC for NN is 11.35%. Can I say that the learning models have improved the random …