overfitting

dataset split for image classification

Hello-experts

2022年6月5日 00:06

I am trying to do image classification for 14 categories (around 1000 images for each cat). And i initially created two folders for training and validation. In this case, do I still need to set a validation split or a subset in a code? or I can use the whole files as train_ds and val_ds by deleting them Folder names in the training and validation directory are same. data_dir = 'trainingdatav1' data_val = 'Validationv1' train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.1, #is …

Topic: validation overfitting image-classification dataset

Category: Data Science

Is this XGBoost model tending to overfit?

Suvrodip Mukhopadhyay

2022年6月4日 17:38

Here is the list of hyperparameters that I used: params = { 'scale_pos_weight': [1.0], 'eta': [0.05, 0.1, 0.15, 0.9, 1.0], 'max_depth': [1, 2, 6, 10, 15, 20], 'gamma': [0.0, 0.4, 0.5, 0.7] } The dataset is imbalanced so I used scale_pos_weight parameter. After 5 fold cross validation the f1 score that I got is: 0.530726530426833

Topic: hyperparameter-tuning overfitting xgboost hyperparameter dataset

Category: Data Science

Overfitting problem: high accurance and low accurancy validation for image classification

developer roby

2022年5月26日 05:01

I want to define a model to predict 3 categories of images. I'm learnong on the field :-) I've 1500 images (500 for each category) in 3 directories. I've read in this blog many suggestions: use a simple loss function use droput use shuffle I've applied these tricks but the model still overfits ... This is the code I'm using, any suggestion? dim_x = 500 dim_y = 200 dim_kernel = (3,3) data_gen = ImageDataGenerator(rescale=1/255,validation_split=0.3) data_dir = image_path train_data_generator=data_gen.flow_from_directory( data_dir, target_size=(dim_x,dim_y), …

Topic: overfitting tensorflow image-classification

Category: Data Science

how to reduce overfitting and improve confusion matrix

Beba.S

2022年5月25日 09:14

I am trying to apply the following model on my data which is consists of (4030 samples as 5 classes) each sample is MFCC features which is extracted from an audio clip consisting of (20 second) and I am trying to apply classification, but I got very poor accuracy and I also have overfitting, , Although I am using data augmentation and I also try to apply Batch Normalization to improve overfitting but the result is very bad. the Model: …

Topic: overfitting deep-learning confusion-matrix classification machine-learning

Category: Data Science

Training loss decreasing while Validation loss is not decreasing

ali khorshidian

2022年5月25日 08:04

I am wondering why validation loss of this regression problem is not decreasing while I have implemented several methods such as making the model simpler, adding early stopping, various learning rates, and also regularizers, but none of them have worked properly. any suggestions would be appreciated. here is my code and my outputs: optimizer = keras.optimizers.Adam(lr=1e-3) model = Sequential() model.add(LSTM(units=50, activation='relu', activity_regularizer=tf.keras.regularizers.l2(1e-2), return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) model.add(Dropout(0.2)) model.add(LSTM(units=50, activation='relu', activity_regularizer=tf.keras.regularizers.l2(1e-2), return_sequences=False)) model.add(Dropout(0.2)) model.add(Dense(y_train.shape[1])) model.compile(optimizer=optimizer, loss='mae') callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3) history = …

Topic: machine-learning-model validation overfitting training keras

Category: Data Science

How solved "ValueError: y should be a 1d array, got an array of shape () instead."?

Asma Tolihan

2022年5月24日 08:47

from tkinter import * from tkinter import ttk from tkmacosx import Button top = Tk() top.title("Jobs") top.geometry("1000x800") line1 = LabelFrame(top, text='') line1.pack(expand = 'yes', fill = 'both') n = StringVar() categorychoosen = ttk.Combobox(line1, width = 27, textvariable = n) # Adding combobox drop down list categorychoosen['values'] = ('Advocate','Arts','Automation Testing','Blockchain','Business Analyst', 'Web Designing') categorychoosen.place(x=50, y=150) categorychoosen.current() name=Label(line3,text="Welcom to ... company",font =("Arial", 10)) name.place(x=0, y=0) n1 = StringVar() sectionchoosen = ttk.Combobox(line3, width = 27, textvariable = n1) # Adding combobox drop down …

Topic: k-nn overfitting python-3.x classification python

Category: Data Science

How to train a keras model on both original and augmented data from ImageDataGenerator?

Mohamed Taha

2022年5月19日 14:02

I have a dataset that contains about 87000 images in a directory, with each class in a separate subfolder. I've tried the class ImageDataGenerator() and the function flow_from_directory() for generating the images, it worked completely fine but I have a question.. Does flow_from_directory() only yield the augmented images? and if this is the case, how can I train my model "which has overfit the training set" on both original and augmented data? Thanks

Topic: data-augmentation overfitting keras

Category: Data Science

How can i deal with this overfitting?

Lei

2022年5月19日 12:10

I trained my data over 40 epochs but got finally this shape. How can I deal with this problem? Please as I used 30.000 for training and 5000 for testing and lr_schedule = keras.optimizers.schedules.ExponentialDecay( initial_learning_rate=4e-4, decay_steps=50000, decay_rate=0.5) should I increase the number of data in testing or make changes in the model? EDIT After I add regularization I got this shape and the loss started from a number greater than before in the previous shape, does that normal? Is this …

Topic: learning-rate overfitting deep-learning

Category: Data Science

Minimum number of samples to train XGBoost without overfitting

jeremy_rutman

2022年5月12日 14:01

When using Neural Networks for image processing I learned a rule of thumb: to avoid overfitting, supply at least 10 training examples for every neuron. Is there a similar rule of thumb for classifiers such as XGBoost, presumably taking into account the number of features and estimators? And, considering the 'curse of dimensionality' shouldn't the rule of thumb be that n_training is geometric in n_dimensions, and not linear?

Topic: overfitting xgboost neural-network classification

Category: Data Science

Is my model overfitting ? Training Acc :93 % test accuracy 82%

As13

2022年5月6日 10:04

I am using LGBM model for binary classification. After hyper-parameter tuning I get Training accuracy 0.9340 Test accuracy 0.8213 can I say my model is overfitting? Or is it acceptable in the industry? Also to add to this when I increase the num_leaves for the same model,I am able to achieve: Train Accuracy : 0.8675 test accuracy : 0.8137 Which one of these results are acceptable and can be reported?

Topic: lightgbm overfitting machine-learning

Category: Data Science

Training Object Detection model on just 10 images

asanoop24

2022年5月4日 00:02

I am trying to train an object detection model using Mask-RCNN with Resnet50 as backbone. I am using the pre-trained models from PyTorch's Torchvision library. I have only 10 images that I can use to train. Of the same 10 images, I am using 3 images for validation. For the evaluation, I am using the evaluation method used in COCO dataset which is also provided as .py scripts in the TorchVision's github repository. To have enough samples for training, I …

Topic: faster-rcnn object-detection pytorch overfitting training

Category: Data Science

SciKit-Learn Decision Tree Overfitting

Paulfryy

2022年5月3日 20:09

We have a project to utilize a few algorithms we have learned so far. I've been using SciKit-Learn to perform these algorithms, but when it comes to decision trees I keep getting a feeling I am overfitting. I'm using a dataset about the weather, giving characteristics such as city, state, month, year, wind direction, wind speed, etc... where the target variable is the average temperature for the day. Now I know this is hard to classify, as it is pretty …

Topic: overfitting decision-trees scikit-learn python machine-learning

Category: Data Science

Does eval loss decreasing slower than train loss indicate overfitting?

Zepol

2022年5月1日 08:57

I am training a binary classifier using an efficientnetv2 model with a 1M image dataset where I do a 60/20/20 split. Does this graph mean that the model is over-fitting? I can see that the train loss is going down much faster than the eval loss but the eval loss is still going down and the accuracy is going up. Accuracy may seem to be low but it is actually a pretty decent amount for the problem I am working …

Topic: binary-classification cnn overfitting training deep-learning

Category: Data Science

Overfitted model produces similar AUC on test set, so which model do I go with?

rayven1lk

2022年4月26日 21:02

I was trying to compare the effect of running GridSearchCV on a dataset which was oversampled prior and oversampled after the training folds are selected. The oversampling approach I used was random oversampling. Understand that the first approach is wrong since observations that the model has seen bleed into the test set. Was just curious about how much of a difference this causes. I generated a binary classification dataset with following: # Generate binary classification dataset with 5% minority class, …

Topic: gridsearchcv overfitting sampling class-imbalance random-forest

Category: Data Science

Low classification accuracy

jared

2022年4月26日 10:40

I want to do a multi class classification with 6 classes. Whole dataset has 12750 and 56 features samples, so every class has 2125 samples. Before prediction I reduces amount of outliers by winsorization (for 1 and 99 percentile) and I reduced skewness in features which has more than 1 and less than -1 skewness by Yeo-Johnson transformation and I got dataset: https://i.stack.imgur.com/miy8i.png Later, of course, I splitted dataset for 80% of training data and 20% of test data and …

Topic: overfitting prediction multiclass-classification classification machine-learning

Category: Data Science

Correctly evaluate model with oversampling and cross-validation

Matteo Felici

2022年4月22日 16:06

I'm dealing with a classic case of dataset with binary imbalanced target (event 3%, non event 97%). My idea is to apply some sort of sampling (over/under, SMOTE etc.) to address the issue. As I see, the correct way of doing this is to sample ONLY the train set, in order to have a test performance that is more similar to reality. Moreover, I want to use CV for hyperparameters tuning. So, the tasks in order are Divide dataset into …

Topic: overfitting sampling cross-validation

Category: Data Science

Multilabel Classification - Overfitting?

shepan6

2022年4月22日 06:01

My task is the following: To input drug combinations and output renal failure-related symptoms from the drug combinations. Both the drug combinations and renal-failure related symptoms are represented as one-hot encoded (for example, someone getting symptom 1 and symptom 3 out of a total of 4 symptoms is represented as [1,0,1,0]). So far, I have ran the data through the following models and they have produced this interesting graph. The left-hand graph depicts the training and validation loss of the …

Topic: generalization overfitting multilabel-classification deep-learning neural-network

Category: Data Science

Overfitting CNN model - any relation to input image size?

Deepak

2022年4月20日 05:52

If my CNN model is over-fitting despite trying all possible hyper parameter tuning, does it mean I must decrease/increase my input image size in the Imagadatagenarator?

Topic: cnn overfitting neural-network

Category: Data Science

my k-fold validation is giving a lot of 100% in the concatenated confusion matrix, is it because of overfitting?

S R

2022年4月18日 16:14

The confusion matrix is a concatenated one from a 5-fold stratified cross-validation of my data set. I used rbf kernel for the svm classifier. Is it telling me the classifier is overfitting? Plus when I plot the confusion matrix from the training datasets ( 70% training 30% testing ), it is giving pretty much the same confusion matrix as the cross-validation one. The unseen testing datasets is also giving pretty much the same confusion matrix. should I worry about overfitting?

Topic: overfitting cross-validation python machine-learning

Category: Data Science

Is data leakage giving me misleading results? Independent test set says no!

PeMADS

2022年4月14日 08:07

TLDR: I evaluated a classification model using 10-fold CV with data leakage in the training and test folds. The results were great. I then solved the data leakage and the results were garbage. I then tested the model in an independent new dataset and the results were similar to the evaluation performed with data leakage. What does this mean? Was my data leakage not relevant? Can I trust my model evaluation and report that performance ? Extended version: I'm developing …

Topic: model-evaluations data-leakage overfitting model-selection machine-learning

Category: Data Science

About