glm - Geeks Mental

Interpreting interaction term coefficient in GLM/regression

r_noobie

2022年5月25日 13:58

I'm a psychology student and trying come up with a research plan involving GLM. I'm thinking about adding an interaction term in the analysis but I'm unsure about the interpretation of it. To make things simple, I'm going to use linear regression as an example. I'm expecting a (simplified) model like this: $$y = ax_{1} + bx_{2} + c(x_{1}*x_{2})+e$$ In my hypothesis, $x_{1}$ and $y$ are negatively correlated, and $x_{2}$ and $y$ are positiely correlated. As for correlation between $x_{1}$ …

Topic: regression glm statistics

Category: Data Science

How to interpret two continous variables output using GAM?

Hasan Sohail

2022年4月22日 05:00

I really need help with GAM. I have to find out whether association is linear or non-linear by using GAM. The predictor variable is temperature at lag0 and the output is cardiovascular admissions (count variable). I have tried a lot but I am not able to understand how to interpret the graph and output that I am getting. I tried this formula using mgcv package: model1<- gam(cvd ~ s(templg0), family=poisson) summary(model1) plot(model1) So here is the output for summary that …

Topic: interpretation correlation glm statistics r

Category: Data Science

Predictive model to maximize sum of dependent variable?

Lamden

2022年4月13日 16:06

I am trying to classify cars for a towing company. Junky cars earn more when sent to the junkyard, and the more valuable cars should earn more at the auction, despite the auction fee. Creating a logistic regression that takes into account Make, Model, Mileage, Year and Run status helps us improve the accuracy of which cars should go where, but a difficulty arises: Sometimes, a car that would be classified as junk can actually be an outlier, and sell …

Topic: regression glm logistic-regression predictive-modeling r

Category: Data Science

NaN, inf or invalid value detected in endog, estimation infeasible error when training statsmodels GLM model

Karima Touati

2022年2月9日 17:32

I am trying to build a GLM model (poisson family) using python statsmodels package on train data. The data I have contains categorical values as exogenous variables and numerical values for my target (endegenous variable). I did standardization for numeric values and one-hot-encoding on categorical values (drop the first level). When I fit the data into the model, I got the following exceptions : ValueError: NaN, inf or invalid value detected in endog, estimation infeasible. When creating this model the …

Topic: statsmodels linear-regression glm scikit-learn python

Category: Data Science

Plotting Deviance Residuals and Leverage of GLM Model Using H2O

lostwanderer

2022年1月15日 02:27

Is it possible to plot the deviance residuals and leverage (e.g. cook's distance) of every observation fitted in a GLM model using H2O? From H2O's documentation, seems it only calculates the sum of all deviance residuals, but cannot output the residuals for each observation.

Topic: h2o glm python

Category: Data Science

H2O Python H2OModelSelectionEstimator

lostwanderer

2022年1月13日 02:31

I want to try H2O's Model Selection function in Python, but cannot load the library for some reason. The following code failed: from h2o.estimators import H2OModelSelectionEstimator Error message: cannot import name 'H2OModelSelectionEstimator' from 'h2o.estimators' Other H2O libraries like H2OGeneralizedLinearEstimator worked fine for me though https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/model_selection.html

Topic: h2o glm python

Category: Data Science

Sudden jumps in accuracy with logistic regression and bag of words : "glm.fit: algorithm did not converge"

Xiiryo

2021年8月24日 08:13

I work on a bag of words, on the Toxic Comments Classifications challenge. The challenge is closed but the dataset is very nice to learn. I use R, tf-idf, tm, and logistic regression. I have a strange pattern in the accuracy results, linked with the error: "glm.fit: algorithm did not converge". It tired the solution proposed in other answers and multiplied maxit by 4, but it did not help. Glimpse of the functions used sub-sampling Original distribution is 200K non-toxic …

Topic: linear-regression glm logistic-regression nlp r

Category: Data Science

Difficulty understanding the difference between Poisson, Quasi-Poisson, and Negative Binomial models

Ali Shana'a

2021年3月20日 18:18

I will try to keep this short. As an assignment for my GLM course, we were given a dataset on the # of homicide victims a person knows, as well as the race of the person. The main idea is to answer the scientific question "Does race help explain how many homicide victims a person knows?". This same dataset, and actually nearly all the sub-problems are solved here: https://data.library.virginia.edu/getting-started-with-negative-binomial-regression-modeling/. My issue is, I am struggling to understand the difference between …

Topic: poisson prediction glm predictive-modeling

Category: Data Science

how to describe a decrease in sales

Jack Smith

2021年3月2日 02:20

Hypothetically, if your company's sales had dropped significantly in 2020, what approach would you take to describe the cause? can you build a model to predict the decrease (between 2019 and 2020 for example) to visualize what the leading indicators are?

Topic: data-analysis marketing glm

Category: Data Science

Error while trying glmnet() in R: "Error in storage.mode(xd) <- "double" : 'list' object cannot be coerced to type 'double'"

JMarcos87

2021年2月2日 12:18

I'm trying to create a logistic regression model using Ridge, this is the code: glmnet(X_Train, Y_Train, family='binomial', alpha=0, type.measure='auc') And this is the error message I'm getting: Error in storage.mode(xd) <- "double" : 'list' object cannot be coerced to type 'double' I tried converting all the variables into "numeric" but still doesn't work. I'm going to post the code for those two datasets so you can reproduce it: libraries: library(dplyr) library(fastDummies) library(missForest) library(glmnet) Data: url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data' crx <- read.csv(url, …

Topic: glm r machine-learning

Category: Data Science

How to retrieve results summary from statsmodels GLM with regularization?

Rylan Schaeffer

2020年12月12日 20:51

I'm trying to fit a GLM to predict continuous variables between 0 and 1 with statsmodels. Because I have more features than data, I need to regularize. statsmodels has very few examples, so I'm not sure if I'm doing this correctly. import statsmodels.api as sm logistic_regression_model = sm.GLM( y, # shape (num data,) X, # shape (num data, num features) link=sm.genmod.families.links.logit) results = logistic_regression_model.fit_regularized(alpha=1.) results.summary() When I run this, asking for a summary raises an error. NotImplementedError Traceback (most recent …

Topic: statsmodels linear-regression glm logistic-regression

Category: Data Science

Alternative to VGAM for Zero Truncated Negativ Binomial GLM in R

JonJup

2020年10月10日 00:03

Is there an alternative to the vgam Package to do a zero truncated negativ Binomial GLM in R?

Topic: glm r

Category: Data Science

Can GLM( generalized linear method) handle the collinearity between the predictor variables in a regression-analysis?

Bharathi

2020年7月27日 10:00

I'm a beginner in Machine learning and I've studied that collinearity among the predictor variables of a model is a huge problem since it can lead to unpredictable model behaviour and a large error. But, are there some models (say GLM) that are perhaps 'okay' with collinearity unlike the classic linear regression? It is said that the classic linear regression assumes there is no correlation between its independent variables. This question arises because I was doing a project that said …

Topic: multivariate-distribution collinearity linear-regression glm

Category: Data Science

Problem in performing LOOCV

Ayush Ranjan

2020年5月20日 19:18

I am trying to run LOOCV on my regression model. I tried to run it in r and encountered the following warning message: Warning message in y - yhat: "longer object length is not a multiple of shorter object length” This is my model: x=glm(x,data = full_data) mse_loocv=cv.glm(full_data,x) mse_loocv$delta Variables used in glm are: x-> target_deathrate ~ avganncount + avgdeathsperyear + incidencerate + medincome + popest2015 + povertypercent + studypercap + medianage + medianagemale + medianagefemale + percentmarried + pctnohs18_24 …

Topic: cross-validation glm r

Category: Data Science

How to model a decimal response between 0 to 1 with a GLM in R

Jared Fowler

2020年4月14日 06:18

I am trying to model a response variable which is a proportion (so a response between 0 and 1, see picture for distribution). Ideally I would like to model it without using the actual counts, so as a decimal. So far I have been using a binomial family in R. model <- glm(Response ~ X1 + X2 + X3, data = Training_data, family = 'binomial') I think the model is doing okay, but when I use it for predictions it …

Topic: glm predictive-modeling r

Category: Data Science

Fitting glm without explicit declaration of each covariate

Ayush Ranjan

2020年1月16日 12:34

When I fit a linear model with many predictor variables, I can avoid writing all of them by using . as follows: model = lm(target_deathrate~., data = full_data) But for models with higher complexity, I cannot make this work: x = glm(target_deathrate~poly(., i),data = full_data) In these cases I have to write all variables. How to avoid writing all variable names and include all variables in my model?

Topic: linear-regression glm r

Category: Data Science

Select the right distribution

Nathalie

2020年1月8日 09:05

I have a dataset like: dframe <- structure(list(ind1 = c(1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L), ind2 = c(0L, 0L, 4L, 3L, 0L, 1L, 0L, 2L), ind3 = c(1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L), ind4 = c(1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L), ind5 = c(0L, 1L, 0L, 0L, 3L, 0L, 0L, 0L), ind6 = c(0L, 0L, 0L, 0L, 4L, 0L, 0L, 0L), ind7 = c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), d1 = c(3L, …

Topic: multivariate-distribution distribution correlation glm r

Category: Data Science

LASSO Regression using Panel Data

JodeCharger100

2019年4月20日 03:52

I have panel data for 3 countries, ranging over 3 years. The dataset is called CarProduction Country Year cars Fuel_price PPP Manufact PublicTransport USA 2015 500 5 10000 9 2 USA 2016 700 5.2 10500 8.75 2.2 USA 2017 780 5.4 11000 8.6 1.9 China 2015 150 9 4000 11 3 China 2016 200 8.6 4500 11.5 4 China 2017 340 9.4 6000 15.6 5 Italy 2015 200 9 4000 11 5 Italy 2016 300 8.6 4500 11.5 6.2 Italy …

Topic: glm r machine-learning

Category: Data Science

Select behavior dependant with other factors and its formalization

ranell

2018年12月9日 20:42

I'm studying occurence of Behavior11, Behavior12,Behavior2,Behavior3 according three variables : Times : task time Time_interval :task time in interval Frequency:Frequency of the task For this purpose, I use GLM attach(datas) an11=anova(glm(Behavior11 ~ Times + Frequency , family=binomial),test="Chisq") an12=anova(glm(Behavior12 ~ Times_interval+ Frequency , family=binomial),test="Chisq") an3=anova(glm(Behavior3 ~ Times_interval+ Frequency , family=binomial),test="Chisq") an2=anova(glm(Behavior2 ~ Times_interval+ Frequency , family=binomial),test="Chisq") I have different significant effect for every behavior. Odds value reveals the direction of dependence exemple: model1=glm(Behavior12 ~ Times_interval+ Frequency, family=binomial) summary(model1) exp(model1$coefficients) Coefficients: Estimate …

Topic: glm statistics r

Category: Data Science

Combining outputs of ridge regression models?

PruthvijThakar

2018年8月22日 12:48

I am facing an issue where I have 7 sets of different variables/columns/predictors. I am trying to predict same target variable and I want to observe the importance/effect of all the sets according to their importance in an ordered manner. (I am trying to use ridge regression models for each of the 7 individual set as I want to keep all the variables and I want to combine the output of these 7 models, each set has more than 20 …

Topic: ensemble-modeling regression glm r machine-learning

Category: Data Science

About