linear-regression

Linear Regression bad results after log transformation

Caldass_

2022年6月3日 05:02

I have a dataset that has the following columns: The variable I'm trying to predict is "rent". My dataset looks a lot similar to what happens in this notebook. I tried to normalize the rent column and the area column using log transformation since both columns had a positive skewness. Here's the rent column and area column distribution before and after the log transformation. Before: After: I thought after these changes my regression models would improve and in fact they …

Topic: linear-regression regression statistics machine-learning

Category: Data Science

Dealing with diverse groups in regression

Kira Bulatov

2022年6月1日 07:17

What happens if a certain dataset contains different "groups" that follow different linear models? For example, let's imagine that examining the scatterplot of a certain feature $x_i$ against $y$ we can see that some points follow a linear relationship with a coefficient $\beta_A<0$ while other points clearly have $\beta_B>0$. We can infer that these points belong to two different populations, population $A$ responds negatively to high values of feature $x_i$ while population $B$ responds positively. We then create a categorical …

Topic: missing-data linear-regression regression

Category: Data Science

Why not using linear regression for finetuning the last layer of a neural network?

Funkwecker

2022年5月30日 11:05

In transfer learning, often only the last layer of the network is retrained using gradient descent. However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or logistic) regression to finetune the last layer?

Topic: finetuning transfer-learning linear-regression neural-network

Category: Data Science

How to combine nlp and numeric data for a linear regression problem

davidm

2022年5月29日 23:02

I'm very new to data science (this is my hello world project), and I have a data set made up of a combination of review text and numerical data such as number of tables. There is also a column for reviews which is a float (avg of all user reviews for that restaurant). So a row of data could be like: { rating: 3.765, review: `Food was great, staff was friendly`, tables: 30, staff: 15, parking: 20 ... } So …

Topic: tfidf linear-regression scikit-learn nlp

Category: Data Science

how to tune hyperparameters inn regression neural network

Abdullah

2022年5月28日 15:56

hope you are enjoying good health,i am trying to built a simple neural network which has to predict a shear wave well log values from other well logs,but my model's is stuck in mean absolute error of 2.45 and it is not improving further,i have changed the number of neurons,learning rate,loss function but of no use. Here is my model: tf.random.set_seed(42) model=tf.keras.Sequential([ tf.keras.layers.Dense(22,activation='relu'), tf.keras.layers.Dense(1) ]) #commpiling: model.compile( loss=tf.losses.mae, optimizer=tf.optimizers.Adam(learning_rate=0.006), metrics=['mae'] ) #fitting: history=model.fit(x_train,y_train,epochs=1000,verbose=0,) #evaluation: model.evaluate(x_test,y_test) here is the boxplot of …

Topic: machine-learning-model loss-function linear-regression deep-learning machine-learning

Category: Data Science

Relationships between groups of features against independent variables

Sos

2022年5月28日 13:05

I have several groups of features that I'd like to test against independent variables. The idea is to find which groups tend to be associated with a specific value of an independent variable. Let's take the following example where s are samples, f are features, i are independent variables associated with each s. s1 s2 s3 s4 .... f1 0.3 0.9 0.7 0.8 f2 ... f3 ... f4 ... f5 ... i1 low low med high i2 0.9 1.6 2.3 …

Topic: linear-regression statistics predictive-modeling

Category: Data Science

What Equation is model.coef_ Derived From? (SKLearn)

Austin Prater

2022年5月27日 17:26

Fairly simple question, but something I've been unable to understand firmly by scouring the interwebs. After running a LR model using SKlearn, one of the key outputs is coef_ , along with intercept_. I understand that coef_ is a transformation matrix that fully describes the relationships of the model; and that performing the dot-product of the input data, with coef_ and adding intercept_ will produce the predicted values for your inputs. My question is: What is the equation that defines …

Topic: machine-learning-model linear-regression scikit-learn

Category: Data Science

My Linear Regression Model Mean Absolute Error(MAE) is 0.29 and R2 0.20 , Is this a acceptable Model?

Aadhil Imam

2022年5月26日 12:17

My Linear Regression Model Mean Absolute Error(MAE) is 0.29 and R2 0.20 , Is this a acceptable Model ? How can increase the r2 score ?

Topic: rmse linear-regression regression python machine-learning

Category: Data Science

confidence interval around standardised regression coefficient?

in_cognito

2022年5月26日 05:08

I have computed a simple linear regression model as below, but am confused as to whether the confint() function is sufficient to provide 95% confidence intervals around the standardised regression coefficient in the linear model (beta)? Has anyone else run into this issue or is confint() sufficient to extract the 95% confidence interval (i.e., +/-1.96 standard errors of the standardised regression coefficient)? h1a <- lm(formula = var1~ var2, data = df) # estimate value of intercept (b0) and slope (b1) …

Topic: linear-regression regression r

Category: Data Science

Does PCA helps to include all the variables even if there is high collinearity among variables?

NAS

2022年5月25日 21:01

I have a dataset that has high collinearity among variables. When I created the linear regression model, I could not include more than five variables ( I eliminated the feature whenever VIF>5). But I need to have all the variables in the model and find their relative importance. Is there any way around it?. I was thinking about doing PCA and creating models on principal components. Does it help?.

Topic: collinearity pca linear-regression

Category: Data Science

SKLearn - Different Results B/w Default Linear Model and1st Order Polynomial Linear Model

Austin Prater

2022年5月25日 18:29

SUMMARY I'm building a linear regression model using Scikit and noticing that the model "performance" (RMSE and max error, namely) varies depending on whether I use the default LR or whether I apply PolynomialFeature(degree=1). My understanding is that these outcomes should be identical, since they are both utilizing a single-order LR model, however, my error is consistently lower when using the PolyFeatures version. TLDR When I run the code below, the second chunk (polynomial = degree of 1) is consistently …

Topic: machine-learning-model python-3.x linear-regression scikit-learn

Category: Data Science

Recommendations for modelling panel data

IndigoChild

2022年5月25日 12:52

sending positive wishes to y'all. I have about 10 years of growth rates in real estate prices and some other macroeconomic variables such as inflation, unemployment rates, fuel prices, growth in prices of raw materials among many others. I want to analyze the causality of all of these variables on the growth in real estate prices. The simplest thing would be to build a linear regression model, but given this is not cross-sectional and more like a time series data, …

Topic: linear-regression statistics machine-learning

Category: Data Science

How to Approach Linear Machine-Learning Model When Input Variables are Inconsistent

Austin Prater

2022年5月24日 23:40

Disclaimer: I'm relatively new to the data science and ML world -- still trying to get a firm grasp on the fundamentals. I'm trying to overcome a regression challenge involving a large, multi-dimensional dataset, but am hitting a roadblock when it comes to my input data. This dataset consists of a few key input criteria: [FLOW, TEMP, PRESSURE, VOLTAGE_A] and a single output variable, VOLTAGE_B (this is what I'm hoping to effectively model and predict). I'm able to handle this …

Topic: python-3.x linear-regression scikit-learn dimensionality-reduction

Category: Data Science

Linear Regression Coefficient Calculation

Anand Rayudu

2022年5月23日 02:05

class LR: def __init__(self, x, y): self.x = x self.y = y self.xmean = np.mean(x) self.ymean = np.mean(y) self.x_xmean = self.x - self.xmean self.y_ymean = self.y - self.ymean self.covariance = sum(self.x_xmean * self.y_ymean) self.variance = sum(self.x_xmean * self.x_xmean) def getYhat(self, input_x): input_x = np.array(input_x) return self.intercept + self.slope * input_x def getCoefficients(self): self.slope = self.covariance/self.variance self.intercept = self.ymean - (self.xmean * self.slope) return self.intercept, self.slope I am using the above class to calculate intercept and slope for a Simple Linear …

Topic: linear-regression

Category: Data Science

Multiple regression (using machine learning - how plot data)

Hedd

2022年5月21日 20:01

I wonder how I can use machine learning to plot multiple linear regression in a figure. I have one independent variable (prices of apartments) and five independent (floor, builtyear, roomnumber, square meter, kr/sqm). The task is first to use machine learning which gives the predicted values and the actual values. Then you have to plot those values in a figure. I have used this code: x_train, x_test, y_train, y_test = tts(xx1, y, test_size=3) Outcome: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False) regr.fit(x_train, y_train) …

Topic: plotting linear-regression machine-learning

Category: Data Science

How to find lagged cross correlation between time series?

Pedro Henrique Gomes Venturott

2022年5月21日 01:02

I have 2 time series, $X$ and $Y$, and I'm trying to find the best lag range that correlates $X$ to $Y$ (find the amount(s) of lag of $X$ that best correlate to the target variable $Y$). For instance, if the best lag range is between $t = 8$ and $t = 10$, then the final equation would be $Y_t = \alpha_1 X_{t-8} + \alpha_2 X_{t-9} + \alpha_3 X_{t-10} + \alpha_4$. Since the value of $Y$ could depend not only …

Topic: linear-regression correlation time-series

Category: Data Science

Make fitted xgboost or linear regression model predicts values in thé future

Djakarta_zero

2022年5月20日 16:25

I have a machine learning model that I fitted with xgboost and linear regression. My dataset has thirteen features and has price as the target. Is there any way to make the model predict values in the future? I have date time as one of the variables. From searching on internet, I learned about fb prophet, and that this is a time series problem. But if my xgboost is doing well, is there a way to make it predict future …

Topic: forecasting xgboost linear-regression time-series machine-learning

Category: Data Science

How do I correctly build model on given data to predict target parameter?

Jhon Patric

2022年5月20日 09:04

I have some dataset which contains different paramteres and data.head() looks like this Applied some preprocessing and performed Feature ranking - dataset = pd.read_csv("ML.csv",header = 0) #Get dataset breif print(dataset.shape) print(dataset.isnull().sum()) #print(dataset.head()) #Data Pre-processing data = dataset.drop('organization_id',1) data = data.drop('status',1) data = data.drop('city',1) #Find median for features having NaN median_zip, median_role_id, median_specialty_id, median_latitude, median_longitude = \ data['zip'].median(),\ data['role_id'].median(),\ data['specialty_id'].median(),\ data['latitude'].median(),\ data['longitude'].median() data['zip'].fillna(median_zip, inplace=True) data['role_id'].fillna(median_role_id, inplace=True) data['specialty_id'].fillna(median_specialty_id, inplace=True) data['latitude'].fillna(median_latitude, inplace=True) data['longitude'].fillna(median_longitude, inplace=True) #Fill YearOFExp with 0 data['years_of_experience'].fillna(0, inplace=True) target = dataset.location_id …

Topic: linear-regression regression scikit-learn recommender-system machine-learning

Category: Data Science

Feature importance of a linear regression

NAS

2022年5月19日 06:10

What is the easiest and easy to explain feature importance calculation for linear regression? I know I can use Shap to compute feature importance, but I find it difficult to explain it to stakeholders, and the coefficient is not a good measure of feature importance since it depends on the scale of the feature. Some suggested (standard deviation of feature)*feature coefficient as a good measure of feature importance.

Topic: feature-importances linear-models linear-regression statistics machine-learning

Category: Data Science

please help, i got an error while trying to my data, i got an error like x and y must be thesame size

Chris dylan J'TEMFACK

2022年5月18日 21:32

import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv('housing.csv') data.drop('ocean_proximity', axis=1, inplace = True) data.head() longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value 0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 …

Topic: matplotlib linear-regression pandas

Category: Data Science

About