binary-classification

Sampling Highly Imbalanced Large Dataset

Harshit Gupta

2022年5月18日 20:50

I am working on a model which will run monthly on 8M users. I've snapshot-wise data in training set, eg: Jan, 21 Snapshot : 8M Total : 233 Positives Rest Negative Feb, 21 Snapshot : 8M Total : 599 Positives Rest Negative March, 21 Snapshot : 8M Total : 600 Positives Rest Negative April, 21 Snapshot : 8M Total : 750 Positives Rest Negative similarly till March, 2022 I'm keeping March, 2022 as test set, which has 2000 positive labels …

Topic: binary-classification class-imbalance

Category: Data Science

Keras Binary Classification - Maximizing Recall

ceds

2022年5月13日 12:28

Let me start by saying my machine learning experience is... dangerous at this stage. I'm still a beginner. I have a binary classification data set of about 100 000 records. 10% of the records are positive and the rest obviously negative. Thus a highly skewed dataset. It is extremely important to maximize the positive (true positive) prediction accuracy (recall) at the expense of negative (true negative) prediction accuracy . Thus, I would rather have an overall 70% accuracy if positive …

Topic: binary-classification keras classification

Category: Data Science

How to use confidence labels?

2022年5月13日 07:23

I have 2 sets of training data in csv files. The training data have class labels, 1 for memorable, and 0 for not memorable. In addition, there is also a confidence label for each sample. The class labels were assigned based on decisions from 3 people viewing the photos. When they all agreed, the class label could be considered certain, and a confidence of 1 was written down. If they didn't all agree, then the classification decided on by the …

Topic: binary-classification confidence labels dataset python

Category: Data Science

Binary document classification using keywords for a very small dataset

s21

2022年5月9日 14:00

I have a set of 150 documents with their assigned binary class. I also have 1000 unlabeled documents. Each document is about the length of a journal paper. Each class has 15 associated keywords. I want to be able to predict the assigned class of the documents using this information. Does anyone have any ideas of how I could approach this problem?

Topic: binary-classification text-classification classification nlp machine-learning

Category: Data Science

LGBM model predicting only single class on unseen data!

As13

2022年5月6日 07:08

I have built a LightGBM based machine learning model on data of molecules of two classes. The distribution is as follows. Class 0 has 5933 data points and class 1 has 4696. The train test accuracy I get on this data is around 87% and 82% respectively. The roc_auc_score is around 81.5%. But when I try to evaluate model performance on an entirely new dataset which model has never seen before with class label 0 and 1 both having 94 …

Topic: binary-classification lightgbm generalization predictive-modeling machine-learning

Category: Data Science

ML model shooting up prediction probabilities

Kamyar Yazdani

2022年5月4日 18:54

I have a scikit-learn logistic regression binary classifier and tried training it on my dataset. My model does extremely well at a threshold of 0.95 instead of 0.5 and all my predictions on example cases are above 0.8 for both classes. I cannot figure out why my machine learning model is shooting up predictions so much. I would appreciate some potential work arounds for this.

Topic: binary-classification machine-learning

Category: Data Science

Aggregated probability based on multiple predictions on independent samples using the same classifier

deniz

2022年5月3日 14:32

i have a understanding question regarding the interpretation of a aggregation of a machine learning classifier. Lets assume i have trained a binary classifier and it was validated with a accuracy of 70% (dataset is always balanced). My question is now, if this probability seems to low for me - and i would search for ways to improve that without any readjustments on the classifier - would the following idea be valid?: The classifier predicts three independent samples (always with …

Topic: binary-classification prediction probability machine-learning

Category: Data Science

Does eval loss decreasing slower than train loss indicate overfitting?

Zepol

2022年5月1日 08:57

I am training a binary classifier using an efficientnetv2 model with a 1M image dataset where I do a 60/20/20 split. Does this graph mean that the model is over-fitting? I can see that the train loss is going down much faster than the eval loss but the eval loss is still going down and the accuracy is going up. Accuracy may seem to be low but it is actually a pretty decent amount for the problem I am working …

Topic: binary-classification cnn overfitting training deep-learning

Category: Data Science

Meaningfully compare target vs observed TPR & FPR

Alexandru Dinu

2022年4月28日 10:26

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$ (true positive rate) and $FPR$ (false positive rate) metrics on the hold-out test set (call it $S_1$): $TPR_{S_1} = \Pr(\widehat{y} = 1 | y = 1, S_1)$ $FPR_{S_1} = \Pr(\widehat{y} = 1 | y …

Topic: binary-classification model-evaluations mlops

Category: Data Science

ROC-AUC Imbalanced Data Score Interpretation

data wannabe

2022年4月24日 23:52

I have a binary response variable (label) in a dataset with around 50,000 observations. The training set is somewhat imbalanced with, =1 making up about 33% of the observation's and =0 making up about 67% of the observations. Right now with XGBoost I'm getting a ROC-AUC score of around 0.67. The response variable is binary so the baseline is 50% in term of chance, but at the same time the data is imbalanced, so if the model just guessed =0 …

Topic: binary-classification xgboost roc class-imbalance

Category: Data Science

Predictive Maintenance Question (binary classification)

ayman metwally

2022年4月24日 18:16

I have a question regarding "Predictive Maintenance": in this tutorial here: https://docs.microsoft.com/en-us/learn/modules/predictive-maintenance-model-builder/3-choose-scenario-data It says: "Choosing a scenario for predictive maintenance Depending on what your data looks like, the predictive maintenance problem can be modeled through different tasks. For your use case, because the label is a binary value (0 or 1) that describes whether a machine is broken or not, the data classification scenario is appropriate" Now, how can this be used to predict machine failure BEFORE it gets broken? …

Topic: binary-classification

Category: Data Science

Is it vital to do label encoding with target variable

Rus Pylypyuk

2022年4月15日 10:39

Should I always use label encoding while doing binary classification?

Topic: binary-classification encoder encoding

Category: Data Science

how to interpret precsion recall value in binary classification of scikit-learn

user12

2022年4月7日 15:14

I am working with binary classification and my classification report generated through scikit-learn looks like the image below. I am confused I have two precision-recall values one for class 0 and the other for class 1. which value I should consider while writing results?

Topic: binary-classification metric classification

Category: Data Science

How do I choose the right parameters for just plain old simple standarddeviation?

LGR

2022年4月4日 10:06

I am evaluating different models that do binary classifications and basically generate trade signals. They make a prediction of either buy or sell for the next day. I look at 10 different underlying assets and have 3 different variations of data that I train the models with. I evaluated 12 different types of models. That leaves me with 10 x 3 x 12 = 360 different models/predictions. I backtested those trade signals they generate: Most of them do not really …

Topic: binary-classification overfitting statistics

Category: Data Science

How to predict an outcome of the game (next row) based on all previous games (rows)?

Jamess11

2022年3月30日 00:47

I'm a data science student and I've come across a fairly unusual dataset (to me, which explains the vague title). It's of the following form: STAT_1 STAT_2 ... HOME AWAY NEXT_HOME NEXT_AWAY NEXT_RESULT 15 11 ... Team A Team B Team C Team D 1 11 18 ... Team C Team D Team E Team F 0 ... ... ... ... ... ... ... ... 10 11 ... Team W Team X Team Y Team Z 1 Basically, the rows …

Topic: binary-classification transformation machine-learning-model prediction classification

Category: Data Science

Text Classification misclassifying?

FNF

2022年3月18日 01:08

I am trying to solve a binary classification problem. My labels are abusive (1) and non-abusive (0). My dataset was imbalanced (more 1 than 0s) and I used oversampling of the minority label (i.e. 1) to balance my dataset. I have also done pre-processing, feature engineering using TF-IDF and then fed the dataset into a pipeline using 3 classification algorithms namely: Logistic Regression, SVM, and Decision Tree. My evaluation metrics are: Logistic Regression: [[376 33] [ 18 69]] precision recall …

Topic: binary-classification text-classification scikit-learn python

Category: Data Science

When would $\Theta_{Bayes}$ be on the Equal Error Rate curve

Creed Bratton

2022年3月15日 12:05

If we use the classic Bayesian classification for a 2 class problem and classify based on comparing likelihood ratio $LR(x) = \frac{p(x|s=1)}{p(x|s=2)} $ to a $\Theta_{Bayes} = \frac{P(s=2)}{P(s=1)}$ when would this $\Theta_{Bayes}$ create equal false positive and false negative rates? My intuition is if classes have equal priors and $\Theta_{Bayes} = 1$. Is this the case?

Topic: binary-classification naive-bayes-classifier classification

Category: Data Science

Improving roc auc score when accuracy is good

Shubh

2022年3月14日 16:20

I have got a binary classification problem with large dataset of dimensions (1155918, 55) Also dataset is fairly balanced of 67% Class 0 , 33% Class 1. I am getting test accuracy of 73% in test set and auc score is 50 % Recall is 0.02 for Class 1 I am using a logistic regression and also tried pycaret's classification algorithm

Topic: binary-classification roc scikit-learn machine-learning

Category: Data Science

Loss drops to NaN after a short time for a time series classification

Tollpatsch

2022年3月13日 08:04

here is my model code for a binary classification of a time series: def make_model(feature_columns): feature_layer = tf.keras.layers.DenseFeatures(feature_columns) feature_layer_outputs = feature_layer(feature_layer_inputs) feature_layer_outputs = tf.expand_dims(feature_layer_outputs, 1) conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding="same",kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(feature_layer_outputs) conv = keras.layers.BatchNormalization()(conv) conv = keras.layers.ReLU()(conv) conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding="same",kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(conv) conv = keras.layers.BatchNormalization()(conv) conv = keras.layers.ReLU()(conv) conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding="same",kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(conv) conv = keras.layers.BatchNormalization()(conv) conv = keras.layers.ReLU()(conv) conv = keras.layers.Dropout(0.25)(conv) gap = keras.layers.GlobalAveragePooling1D()(conv) output_layer = keras.layers.Dense(1, activation="Softmax")(gap) return keras.models.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output_layer)` So i …

Topic: binary-classification keras loss-function classification time-series

Category: Data Science

How should I improve my CNN binary classification model from overfitting and underfitting

Ahmed Camara

2022年3月11日 19:06

I am trying to do the cats & dogs classification problem, the problem is that my model is overfitting and I have tried all the techniques I know in order to solve but nothing is working such as dropout, data augmentation, l2 and l1 reg. Can you please help me? After the end of the training, my train accuracy was: 0.7868 and my validation accuracy was 0.7044. my image size are (h=48,w=48 with 3 channels, and batch size = 128) …

Topic: binary-classification cnn data-augmentation overfitting deep-learning

Category: Data Science

About