Temperature lag forecasting

I am working on a data science project on an industrial machine. This machine has two heating infrastructures. (fuel and electricity). It uses these two heatings at the same time, and I am trying to estimate the temperature value that occurs in the thermocouple as a result of this heating. However, this heating process takes place with some delay/lag. In other words, the one-unit change I have made in fuel and electrical heating is reflected in the thermocouple hours later. …
Category: Data Science

Feature importance of a linear regression

What is the easiest and easy to explain feature importance calculation for linear regression? I know I can use Shap to compute feature importance, but I find it difficult to explain it to stakeholders, and the coefficient is not a good measure of feature importance since it depends on the scale of the feature. Some suggested (standard deviation of feature)*feature coefficient as a good measure of feature importance.
Category: Data Science

Can absolute or relative contributions from X be calculated for a multiplicative model? $\log{ y}$ ~ $\log {x_1} + \log{x_2}$

(How) can absolute or relative contributions be calculated for a multiplicative (log-log) model? Relative contributions from a linear (additive) model E.g., there are 3 contributors to $y$ (given by the three additive terms): $$y = \beta_1 x_{1} + \beta_2 x_{2} + \alpha$$ In this case, I would interpret the absolute contribution of $x_1$ to $y$ to be $\beta_1 x_{1}$, and the relative contribution of $x_1$ to $y$ to be: $$\frac{\beta_1 x_{1}}{y}$$ (assuming everything is positive) Relative contributions from a log-log …
Category: Data Science

What is the SHAP values for a liner model? How do we derive that?

What is the SHAP values for a linear model? it is given as below in the documentation Assuming features are independent leads to interventional SHAP values which for a linear model are coef[i] * (x[i] - X.mean(0)[i]) for the ith feature. Can someone explain to me how it is derived? Or direct me to a resource explaining the same?.
Category: Data Science

How to write this TeX equation appropriately for publication?

I have a 2 variables that resemble 2 different tests. I would like to multiple it by 0.2 with conditions. If test1 is available and test2 is not (dataset would show NA), use test1. Same condition for test2, if test2 is available and test1 is not, use test2 values. If both test1 and test2 are available, then use the minimum of both value. Below is my formula, is there a more accurate one? $$ total score_i = 0.2*test_i\begin{cases}test1&test1>0\\\\test2&test2>0\\\\min(test1,test2)&test1\ and\ test2 …
Category: Data Science

Is it possible to explain why Lasso models eliminated certain coefficient?

Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it possible why precisely these features are being eliminated from the model? (Is it the presence of any other features/multicollinearity etc.? I want to explain the lasso model behaviour. Your help is highly appreciated.
Category: Data Science

Is it possible to find the feature importance of an aggregate feature from the corresponding independent features in a linear model?

I have a model to predict energy consumption in a food processing plant. Different food products are produced in the plant. My model is given as Energy consumption(Kwhr) = alpha0 +alpha1(Food Item A produced in Kg) + alpha2(Food Item A produced in Kg) +alpha3(Food Item C produced in Kg)+....+Other variables Since different product categories have different energy intensities, I would like to add that detail to the model. Can I derive the feature importance of the Total production on the …
Category: Data Science

Can I include a quotient as dependent variable and independent variables with same denominator in a linear model? How do we interpret such models?

I want to create a model in a food processing plant where my dependent variable is Electricity (KWhr) consumption per kg. Plant produce different food items with varying electricity consumption. I'm interested in knowing the impact of the proportion of each food item on consumption per kg. so my model is consumption per kg produced (Kwhr/kg) = alpha0 +alpha1(Food Item A/Total Production) + alpha2(Food Item B/Total Production)+....+Other variables Is it correct to frame the question like this?. I have Total …
Category: Data Science

Lasso (or Ridge) vs Bayesian MAP

This is the first time I have posted here. I am looking for some feedback or perspective on this question. To make it simple, let's just talk about linear models. We know the MLE solution for the $l_1$ loss objective is the same as the Bayesian MAP estimate with a Laplace prior for each parameter. I'll show it here for convenience. For vector $Y$ with $n$ observations, matrix $X$, parameters $\beta$, and noise $\epsilon$ $$Y = X\beta + \epsilon,$$ the …
Category: Data Science

Ideas to enforce uniformity of error in linear models

I am looking for ideas to not only solve the least square problem, but to enforce errors to be roughly similar. One idea I had is to add the variance of errors in the classical Ordinary Least Square problem. My criterion with respect to matrix A, x and y being vectors, would be as follow: $$ J(A) = \mu_e + \lambda\sigma_e $$ where $$ \mu_e = ||Ax-y||²=\sum{e_i}=\sum||Ax_i - y_i||² $$ and $$ \sigma_e = \sum (e_i - \mu_e)² $$ A …
Category: Data Science

Approaches for multiclass classification with a reference level to extract variables of importance?

I have a dataset with with multiple classes (< 20) which I want to classify in reference to one of the classes.The final goal is to extract the variables of importance which are useful to distinguish each of the classes vs reference. If it helps to frame the question, an example would be to classify different cancer types vs a single healthy tissue and determine which features are important for the classification of each tumour. My first naive approach would …
Category: Data Science

How to visualize optimization problems' feasible region?

Is there any tool to visualize the feasible region when given a set of Linear equations (equalities and inequalities). If not, can anyone suggest a way to visualize it? If I am going to do it myself using Python, which libraries should I use. I have found sympy, but I couldn't get it to draw inequalities nor draw the intersections only. I have also found wolfram, but I could only see pre-built visualizations and not visualize my own system. Can …
Category: Data Science

Does the appliance of R-squared to non-linear models depends on how we calculate it?

Does the appliance of R-squared to non-linear models depends on how we calculate it? $R^2 = \frac{SS_{exp}}{SS_{tot}}$ is going to be an inadequate measure for non-linear models since an increase of $SS_{exp}$ doesn't necessarily mean that the variance is decreasing, but if we calculate it as $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$, then it's as much meaningful for non-linear models as it is for linear ones. I asked a similar question here where I showed that R-squared is no worse for …
Category: Data Science

Understanding the math behind linear classification

For example we have $X$ train data, $y$ and $w$ Our margin is $M = y_i \langle w, x_i \rangle$ If $M_i > 0$ classifier return True predict and otherwise, if $M_i < 0$ we get False predict. How does it work? $y_i = \langle w, x_i \rangle$ , it means they have same sign, and if we multiply them, the multiply product will be always positive, because plus * plus = plus and minus * minus = plus. Otherwise …
Category: Data Science

Does linear kernel make SVM a linear model?

I have deleloped several SVR models for my case study using the linear kernel, and those models were optimized using the RMSE as criterion. Now Im searching for additional evaluation metrics and it turns the most publications use R squared to compare model performance during training and validation phases. It's generally suggested to avoid to use R-squared to assess the model if it uses non-linear kernel such as polynominal or radial basis function. And this refers to the fact that …
Category: Data Science

Linear models: Imputing missing not at random

This question is a continuation of a similar question for linear models instead of Tree-based model. Given that linear models (e.g. lasso, ridge, Linear regression, elastic net, etc.) can't handle missing NaN values and are sensitive to feature scale, what are appropriate approaches to encode or impute missing not at random values in independent features. For example, if I have the following two independent features in my model: CAR_OWNER: Binary features (TRUE/FALSE or 0/1) w/o missing values CAR_COLOR: BLUE, GREEN, …
Category: Data Science

How to make a linear model with a constant value in R?

I'm working on an unassessed homework problem from unpublished course notes of a statistics module from a second year university mathematics course. I'm trying to plot a 2-parameter full linear model and a 1-parameter reduced linear model for the same data set. I can't figure out how to plot the 1-parameter model; all attempts so far have either given errors or a non-constant slope. xs <- c(0,1,2) ys <- c(0,1,1) Data <- data.frame(x = xs, y = ys) mf <- …
Category: Data Science

What is the Intuition behind weight vector W which is normal to the plane? Is the weight vector W same as the W which is normal to the plane π?

In an interview, I was asked the intuition behind the weight vector. I told the weight vector is a vector which we try to minimize to a local minima with the help of regulariser so we don't overfit. Weights tells us the influence of a feature on the model. Although I am not sure if my intuition is correct. Is Weight vector W always normal to the plane? Say we have 5 features and after training say logistic reg we …
Category: Data Science

How can I compare a NN model and a linear regression?

I have a small dataset (1500 rows) and to predict the imbalanced target, I am running two linear models (linear regression and lasso) and one nonlinear model (Neural Network) on it. I am using Area Under Precision Recall Curve (AUPRC) to compare the three models. The baseline in the curve is 10%, AUC for linear regression is 11%, AUC for lasso is 11.2%, and AUC for NN is 11.35%. Can I say that the learning models have improved the random …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.