statsmodels

Why Scikit and statsmodel provide different Coefficient of determination?

Lukáš Tůma

2022年4月19日 20:28

First of all, I know there is a similar question, however, I didn't find it so much helpful. My issue is concerning simple Linear regression and the outcome of R-Squared. I founded that results can be quite different if I use statsmodels and Scikit-learn. First of all my snippet: import altair as alt import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression import statsmodels.api as sm np.random.seed(0) data = pd.DataFrame({ 'Date': pd.date_range('1990-01-01', freq='D', periods=50), 'NDVI': np.random.uniform(low=-1, high=1, …

Topic: statsmodels linear-regression scikit-learn python

Category: Data Science

Some of the p-values are NaN - logistic regression

Sim_Demo

2022年3月19日 22:42

I am trying to do logisitc regression, but have this issue - some of the p values are NaN model = sm.Logit(y2,X2.astype(float)) result = model.fit() result.summary() Any ideas what to do?

Topic: statsmodels logistic-regression python

Category: Data Science

Timeseries VAR vs VARMA model: issue in time to fit model

shaifali Gupta

2022年3月16日 18:04

I want to use VARMA model on a data of about 80000 samples with 10 features. I tried using VARMA model from statsmodels with p=50 and q=10 but it is taking too much time to build the model. I tested the model was running even after 12 hours. Then I tested VARMA using p=50 and q=0, this also was running even after an hour with maxiter=1. The code I am using is: from statsmodels.tsa.statespace.varmax import VARMAX modelVARMA = VARMAX(dff, order=(50,0)) …

Topic: statsmodels time-series python

Category: Data Science

Can we do multivariate time series analysis using holt-winter ( Exponential smoothing) method?

yogesh agrawal

2022年3月5日 13:05

Just like we have a method like ARIMAX and SARIMAX where we can provide exog and endog variable for perfroming multivariate analysis. I was hoping is there a way, we can achieve same using ETS as well. Please let me know in case any has worked on this.

Topic: statsmodels time-series python machine-learning

Category: Data Science

How to interpret my logistic regression result with statsmodels

grumpyp

2022年3月3日 20:06

so I'am doing a logistic regression with statsmodels and sklearn. My result confuses me a bit. I used a feature selection algorithm in my previous step, which tells me to only use feature1 for my regression. The results are the following: So the model predicts everything with a 1 and my P-value is < 0.05 which means its a pretty good indicator to me. But the accuracy score is < 0.6 what means it doesn't say anything basically. Can you …

Topic: interpretation statsmodels logistic-regression scikit-learn python

Category: Data Science

Why coefficients from logistic regression are not proportional to bad rate?

aljrub12

2022年3月1日 13:07

I am building a logistic regression model in Python with statsmodels.api.Logit. The model contains 12 features that are encoded using pandas.get_dummies(). My final training dataset (xTrain) looks like this: feature1_A feature1_B feature2_B feature2_C feature_2_D 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 feature1 is a categorical feature that contains 3 modalities (or categories) A, B, and C (C is used as a base reference so it does not appear in my training set) …

Topic: data-science-model statsmodels logistic-regression classification machine-learning

Category: Data Science

NaN, inf or invalid value detected in endog, estimation infeasible error when training statsmodels GLM model

Karima Touati

2022年2月9日 17:32

I am trying to build a GLM model (poisson family) using python statsmodels package on train data. The data I have contains categorical values as exogenous variables and numerical values for my target (endegenous variable). I did standardization for numeric values and one-hot-encoding on categorical values (drop the first level). When I fit the data into the model, I got the following exceptions : ValueError: NaN, inf or invalid value detected in endog, estimation infeasible. When creating this model the …

Topic: statsmodels linear-regression glm scikit-learn python

Category: Data Science

Persistence and stationarity together

user96624

2021年12月22日 04:59

I am trying to analyse a time series. I want to get only quantitative results (so, I'm excluding things like "looking at this plot we can note..." or "as you can see in the chart ..."). In my job, I analyse stationarity and persistence. First, I run ADF test and get "stationary" or "non-stationary" as results. Then, I need to work on persistence. To do so, I use ACF. My question is: suppose I got "non-stationary" time series. Is it …

Topic: statsmodels finance time-series statistics

Category: Data Science

Difference in statsmodel output vs direct linear algebra with input binary variable

kiacan

2021年12月13日 09:18

I was wondering why there might be a difference when I run a simple multiple linear regression with statsmodels OLS, vs just doing it directly with numpy. The results are identical for both cases, so long as I don't include sex (binary) as one of the predictor variables. I am wondering why this might be the case, and which to prefer in this case? I noticed that in the output of statsmodels it also says Sex[T.1] which may be related …

Topic: statsmodels linear-regression regression

Category: Data Science

When an Fourier Analysis should be used for timeseries data

Zexxxx

2021年12月12日 22:37

When Fourier Analysis should be used for time-series data,except when doing decomposition?

Topic: statsmodels time-series

Category: Data Science

Statsmodel manually set/restore coefficients of model

kiacan

2021年12月12日 15:51

I was wondering if it is possible to manually restore the coefficients of a given model? That is, given a computed set of coefficients, to reinitialize another statsmodel with those parameter (coefficient) outputs? I have tried doing so (in the context of OLS multiple linear regression), but have gotten errors, and I suspect it is because I try to restore the coefficients by fitting to a single sample dataframe (which is a test set), and that maybe alters some properties …

Topic: statsmodels linear-regression regression

Category: Data Science

Understanding plotted ACF on stationary time-series data

Zexxxx

2021年12月10日 17:40

I made time-series data stationary and plotter ACF & PCF, while PCF looks fine, I do not know how to interpretate ACF, as it looks like this - I could not say it geometrical. For PACF - it is Significant till 1 lag

Topic: pacf statsmodels time-series pandas

Category: Data Science

PACF for Airline Passengers dataset: What's wrong?

An old man in the sea.

2021年11月30日 09:09

The airline passengers dataset is available here, but it also comes with in R. I'm working with python, and I import the following (besides the usual like pandas and numpy.) from statsmodels.tsa.stattools import pacf,acf from statsmodels.graphics import tsaplots from statsmodels.tsa.stattools import adfuller,kpss from statsmodels.tsa.statespace.sarimax import SARIMAX from scipy import stats I'm applied the log, and then 1-period difference for detrending , and then 12-period for 'deseasonality'. Then I drop the nan, with df_log_dif_dif12.dropna(inplace=True). I obtain the following numpy array: array([[ …

Topic: pacf statsmodels time-series

Category: Data Science

Does this ARIMA model take seasonality into account?

codeananda

2021年11月14日 08:48

I'm writing a tutorial on traditional time series forecasting models. One key issue with ARIMA models is that they cannot model seasonal data. So, I wanted to get some seasonal data and show that the model cannot handle it. However, it seems to model the seasonality quite easily - it peaks every 4 quarters as per the original data. What is going on? Code to reproduce the plot from statsmodels.datasets import get_rdataset from statsmodels.tsa.arima.model import ARIMA import matplotlib.pyplot as plt …

Topic: forecasting statsmodels arima time-series python

Category: Data Science

How do I use number of hours as index in timeseries forecasting?

Sandhya Indurkar

2021年11月6日 07:29

I have a dataset that has number of hours (consecutive value) and total sales in that 1 hour in my dataset. See below for head of the dataset: t sales -------------- 23 172.3676 24 176.3456 25 166.9039 26 153.9990 27 167.9585 I want to forecast the sales for the next 10 hours. I also set column t as the index. However, when I try to get the seasonal decomposition, it shows an error: result = seasonal_decompose(train['sales'].dropna(), model='additive', freq =12) result.plot() …

Topic: forecasting statsmodels time-series python

Category: Data Science

Mutiple binary classification for for best propensity to buy one of the product

Mangesh Divate

2021年10月18日 12:32

Problem:- I have 5 products for sell and I can pitch only one product in a month to one customer.so I wants to know which product customer can buy. Proposed solution:- I build 5 binary logistic models to understand the probability of each customer to buy particular product. where I am getting 5 probabilities. so what ever model is giving maximum probability amongst 5 I am pitching that product to customer for an example If we have Product A,B,C,D,E to …

Topic: statsmodels multiclass-classification logistic-regression recommender-system statistics

Category: Data Science

Sarimax fit for prediction further into future

khatara

2021年8月13日 23:12

I want to fit sarimax model of statsmodels so that it is optimized for predicting into future not just the next sample. Let's say predicting 5 time steps ahead. I can do this by model.forecast(5) but what I am trying to do is actually fit the model like this so it learns how to best predict 5 time steps ahead. Is it possible?

Topic: statsmodels arima time-series

Category: Data Science

Selecting the best model parameters from grid search SARIMA [Time series]

callmeanythingyouwant

2021年7月22日 08:29

I ran a manual gridsearch of SARIMA across several parameters and now I have 7875 rows of scores (RMSE, MAE, MAPE each) from it. These were the parameters (30k+ permutations) I ran a grid search over- p = [0 to 10] d = [0,1,2] q = [0 to 12] P = [0 to 5] D = [0,1] Q = [0,1,2] S = [0,7] These are the top 20 rows of the results sorted by RMSE in ascending. Parameters are in …

Topic: grid-search statsmodels arima time-series python

Category: Data Science

How to do backward features elimination when considering interactions between them

Oussama Jabri

2021年7月13日 04:03

I have a multi linear regression problem, $Y$ is my target and $X_1, X_2, X_3$ are my features. In my regression, I consider the interaction between $X_1, X_2, X_3$ and I add a bias. So my problem is given by : $Y \sim X_1 + X_2 + X_3 + X_1X_2 + X_1X_3+ X_2X_3+ bias$ Now, I fit my model with statsmodels.api.sm and I want to eliminate the feature the highest p value recursively. My first question is : for example, …

Topic: statsmodels linear-regression feature-selection

Category: Data Science

Selecting most important features for multilinear regression

Maths12

2021年7月12日 21:22

I have a set of 25 features. I would like to choose the best features for my model. Originally, I was looking at the correlation of features with respect to response, and only taking those which are highly correlated and run a regression model. Then, using that model I would predict the outcome based on test data, and compare it to actual (metric RMSE) and this would be how I assess it. I could then add each feature in order …

Topic: statsmodels linear-regression regression feature-selection statistics

Category: Data Science

About