logistic-regression

What to do when one feature has very large importance/weight?

Daria

2022年6月3日 07:27

I am new to Data Science and currently am trying to predict customers churn for a company that offers of subscription-based bookings management software. Its customers are gyms. I have a small unbalanced dataset of a historical data (False 670, True 230) with 2 numerical predictors: age(days since subscription), number of active days in the last month(days on which a customer(gym) had bookings) and 1 categorical: logo (boolean, if a customers uploaded a logo in a software). Predictors have following …

Topic: data-science-model churn logistic-regression classification

Category: Data Science

Why is my training accuracy decreasing higher degrees of polynomial features?

Apoorv Jain

2022年6月3日 00:10

I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle. While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below

Topic: classifier logistic-regression accuracy scikit-learn

Category: Data Science

How to generate a rule-based system based on binary data?

greenButMellow

2022年6月2日 22:38

I have a dataset where each row is a sample and each column is a binary variable. The meaning of $X_{i, j} = 1$ is that we've seen feature $j$ for sample $i$. $X_{i, j} = 0$ means that we haven't seen this feature but we might will. We have around $1000$ binary variables and around $200k$ samples. The target variable, $y$ is categorical. What I'd like to do is to find subsets of variables that precisely predict some $y_k$. …

Topic: decision-trees logistic-regression classification statistics machine-learning

Category: Data Science

How do I modify a Logistic Regression to target a specific point on the ROC curve?

Mike

2022年6月2日 22:23

From a conceptual standpoint I understand the trade off involved with the ROC curve. You can increase the accuracy of true positive predictions but you will be taking on more false positives and vise versa. I wondering how one would target a specific point on the curve for a Logistic Regression model? Would you just raise the probability threshold for what would constitute a 0 or a 1 in the regression? (Like shifting at what probability predictions start to get …

Topic: roc logistic-regression

Category: Data Science

The accuracy depends on the hyper-parameter in a strongly non-monotinic way

Evgeny P. Kurbatov

2022年5月28日 18:27

I have a data set labelled with a binary classes. I calculated the principal components from the data, then made the PC transformation. The goal is to find an optimal number of PCs so that the binary classification accuracy is good enough. I've learned a binary classifier sklearn.linear_model.LogisticRegressionCV (default parameters) on the PC-transformed data. The number of PCs was the (hyper-)parameter and it was varied. I cannot interpret the resulting Accuracy v. #PCs graph, why is it so strange? For …

Topic: hyperparameter logistic-regression classification

Category: Data Science

Determining increments for aggregated time series data to determine impact of individual features

tristar8

2022年5月27日 12:45

I'm working with a data source that provides itemised transactions, which I am aggregating into 1 hour blocks to determine a 'rate per hour' as the dependent or target variable - i.e. like a time series. So far I've looked at Logistic Regression, Random Forest Regressor and Gradient Boosting Regressor and got reasonable results - but am really trying to determine the weighting/ impact of the independent variables, to see which have the biggest impact on the DV. Would there …

Topic: logistic-regression random-forest time-series

Category: Data Science

Log odds vs Log probability

Apoorva

2022年5月27日 12:02

Log-odds has a linear relationship with the independent variables, which is why log-odds equals a linear equation. What about log of probability? How is it related to the independent variables? Is there a way to check the relationship?

Topic: probability logistic-regression

Category: Data Science

Deciding Initial Weights In A Linear Classifier For Sentiment Analysis

Suhail Gupta

2022年5月26日 16:00

I would like to build a simple sentiment analysis classifier using logistic regression. I downloaded a list of positive and negative words from cs.uic.edu. There are more than 6000 words both positive and negative. Linear Classifier has the form: (Wikipedia Reference) $$\sum wj*xj$$ where $w$ is the weight of the vector $x$. So for example, if the weight of vector awesome is 3, then in the following sentence: Food is awesome and music is awesome. according to the formula, it …

Topic: machine-learning-model logistic-regression sentiment-analysis classification machine-learning

Category: Data Science

NAN in keras neural network results

Ayan Mitra

2022年5月24日 03:40

I am creating a neural network simple architecture. But I keep getting NAN in result, cant figure out why, below is my code. import pandas from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasClassifier from keras.utils import np_utils from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.preprocessing import LabelEncoder from sklearn.pipeline import Pipeline from collections import Counter from sklearn.metrics import classification_report, confusion_matrix from sklearn.preprocessing import StandardScaler #from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras …

Topic: categorical-encoding keras logistic-regression neural-network python

Category: Data Science

For Logistic regression, why is that particular logistic function chosen as opposed to other logistic functions?

yonasboson

2022年5月23日 10:02

The logistic function used in logistic regression is: $\frac{e^{B_{0} + B_{1}x}}{1 + e^{B_{0} + B_{1}x}}$. Why is this particular one used?

Topic: logistic-regression

Category: Data Science

Logistic Regression for prediction

Ledian K.

2022年5月23日 07:08

I would like to ask about the theoretical approach of using Logistic Regression for customer data and more specifically Churn Prediction (in BigQuery and Python). I have my customer data for an online shop and I would like to predict if the customer will churn based on some characteristics. I have created my dataset and the Churn label (based on the hypothesis that if the customer hasn't bought something in the last year then it is assumed that the customer …

Topic: prediction churn logistic-regression

Category: Data Science

Issues with self-implemented logistic regression

Steve Ahlswede

2022年5月22日 17:03

I am trying to self-implement a logistic regression algorithm to do some self-learning but I am having a bit of trouble with achieving similar accuracy to the logistic regression of sklearn. Here is the code I am using (the dataset I am using is the titanic 'training.csv' dataset from kaggle which you can download here if you want to test this out yourself.) import numpy as np import random import matplotlib.pyplot as plt #%matplotlib inline def cost(X, Y, W): """ …

Topic: implementation logistic-regression scikit-learn python machine-learning

Category: Data Science

Can I perform a Logistic regression on this data?

Luke Wardford

2022年5月21日 20:37

I have the data below: I want to explain the relationship between 'Milieu' who has two factors, and 'DAM'. As you may notice, the blue population's included in the red population. Can I apply a logistic regression?

Topic: logistic-regression

Category: Data Science

Decomposing R squared or VIF

Jun Jang

2022年5月21日 18:07

In the context of multi-regression, I am wondering if there is a way to decompose $$VIF_i = 1/(1-R_i^2)$$ where $R_i^2$ is the r squared obtained from the regression of dependent variable = i and independent variables are all other factors. I want to decompose $VIF_i$ or $R_i^2$ into individual factors to see how much each individual factor contributes to the $VIF_i$ or $R_i^2$ Someone recommended using the square of partial correlation coefficient and that value is linearly related to $R_i^2$. …

Topic: regression logistic-regression

Category: Data Science

Different results in same logistic regression model from sklearn and same dataset

Hing

2022年5月20日 16:31

I got this strange behavior when deploying my logistic regression trained in scikit-learn into production. I trained the model on my own machine and stored it in form of .pickle. I use the same set of data for both locally and on server side (with docker) generating four columns for each sample in this binary classification problem: probability_of_class_0, probability_of_class_1, y_true, y_predict; where y_true and y_hat refer to the true label and the predicted label respectively for that sample row/record. And …

Topic: logistic-regression

Category: Data Science

Deriving a binary logistic classifier from a multi class logistic classifier

user1767774

2022年5月19日 17:28

Given a multi class logisitic classifier $f(x)=argmax(softmax(Ax + \beta))$, and a specific class of interest $y$, is it possible to construct a binary logistic classifier $g(x)=(\sigma(\alpha^T x + b) > 0.5)$ such that $g(x)=y$ if and only if $f(x)=y$?

Topic: softmax multiclass-classification logistic-regression classification

Category: Data Science

Exact Shap calculations for logistic regression?

lcrmorin

2022年5月18日 09:15

Given the relatively simple form of the model of standard logistic regression. I was wondering if there is an exact calculation of shap values for logistic regressions. To be clear I am looking for a closed formula depending on features ($X_i$) and coefficients ($\beta_i$) to calculate Shapley values and their corresponding importance.

Topic: shap explainable-ai logistic-regression

Category: Data Science

Verifying my understanding of MLE & Gradient Descent in Logistic Regression

Apoorva

2022年5月18日 06:49

Here is my understanding of the relation between MLE & Gradient Descent in Logistic Regression. Please correct me if I'm wrong: 1) MLE estimates optimal parameters by taking the partial derivative of the log-likelihood function wrt. each parameter & equating it to 0. Gradient Descent just like MLE gives us the optimal parameters by taking the partial derivative of the loss function wrt. each parameter. GD also uses hyperparameters like learning rate & step size in the process of obtaining …

Topic: parameter-estimation gradient-descent logistic-regression

Category: Data Science

What's the order in applying SMOTE transformation in a pipeline?

dummyds

2022年5月18日 01:58

Here's the thing, I have an imbalanced data and I was thinking about using SMOTE transformation. However, when doing that using a sklearn pipeline, I get an error because of missing values. This is my code: from sklearn.pipeline import Pipeline # SELECAO DE VARIAVEIS categorical_features = [ "MARRIED", "RACE" ] continuous_features = [ "AGE", "SALARY" ] features = [ "MARRIED", "RACE", "AGE", "SALARY" ] # PIPELINE continuous_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="most_frequent")), ("scaler", StandardScaler()), ] ) categorical_transformer = Pipeline( steps=[ …

Topic: smote sampling logistic-regression python predictive-modeling

Category: Data Science

Is it possible to "fine-tune" a pre-trained logistic regression model?

eduardokapp

2022年5月17日 16:57

Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For example, let's say I have a dataset with attribute variables of an animal and I want to classify whether or not it is a mammal or not. The labels on that dataset are only "mammal"/"not mammal". I then train a logistic regression model for this …

Topic: pretraining finetuning logistic-regression scikit-learn

Category: Data Science

About