hyperparameter-tuning

Is this XGBoost model tending to overfit?

Suvrodip Mukhopadhyay

2022年6月4日 17:38

Here is the list of hyperparameters that I used: params = { 'scale_pos_weight': [1.0], 'eta': [0.05, 0.1, 0.15, 0.9, 1.0], 'max_depth': [1, 2, 6, 10, 15, 20], 'gamma': [0.0, 0.4, 0.5, 0.7] } The dataset is imbalanced so I used scale_pos_weight parameter. After 5 fold cross validation the f1 score that I got is: 0.530726530426833

Topic: hyperparameter-tuning overfitting xgboost hyperparameter dataset

Category: Data Science

Hyper-parameter tuning of NaiveBayes Classier

Sameer Zahid

2022年6月4日 16:33

I'm fairly new to machine learning and I'm aware of the concept of hyper-parameters tuning of classifiers, and I've come across a couple of examples of this technique. However, I'm trying to use NaiveBayes Classifier of sklearn for a task but I'm not sure about the values of the parameters that I should try. What I want is something like this, but for GaussianNB() classifier and not SVM: from sklearn.model_selection import GridSearchCV C=[0.05,0.1,0.2,0.3,0.25,0.4,0.5,0.6,0.7,0.8,0.9,1] gamma=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] kernel=['rbf','linear'] hyper={'kernel':kernel,'C':C,'gamma':gamma} gd=GridSearchCV(estimator=svm.SVC(),param_grid=hyper,verbose=True) gd.fit(X,Y) print(gd.best_score_) print(gd.best_estimator_) …

Topic: hyperparameter-tuning naive-bayes-classifier hyperparameter scikit-learn machine-learning

Category: Data Science

CNN for subsets of a dataset - how to tune hyperparameters

Code Now

2022年6月4日 16:02

I have a dataset and would like to train CNNs on subsets of different size of the dataset. I already have a CNN, which classifies very well if I use the entire dataset. Now the question arises if I should really try to additionally optimize the parameters of the CNN for the subsets, regardless of whether I do Data Augmentation or not? Does it really make sense if I try to change the CNN model for the subsets by using …

Topic: hyperparameter-tuning cnn gridsearchcv accuracy dataset

Category: Data Science

Why does hyperparameter tuning occur on validation dataset and not at the very beginning?

Socorro

2022年5月28日 14:15

Despite doing/using it a few times, I'm still slightly confused by the use of a validation set for hyper parameter tuning. As far as I can tell, I choose a model, train it on training data, assess performance on training data, then do hyper parameter tuning assessing model performance on validation data, then choose the best model and test this on test data. In order to do this, I basically need to pick a model at random for training data. …

Topic: hyperparameter-tuning hyperparameter deep-learning neural-network machine-learning

Category: Data Science

Is data subsampling appropriate for hyperparameter optimisation?

hH1sG0n3

2022年5月26日 21:04

Fundamentally, under what circumstance is it reasonable to do HPO only on a subsample of the training set? I am using Population Based Training to optimise hparameters for a sequence model. My dataset consists of 20M sequences and was wondering if it would make sense to optimise on a subsample due to restricted budget.

Topic: hyperparameter-tuning deep-learning neural-network

Category: Data Science

Importance order of Hyperparameters for RandomForestClassifier

Adrian Evensen

2022年5月24日 19:47

I'm doing a random search of hyperparameters for a RandomForestClassifier and was wondering what is the order of importance of hyperparameters to search from? In other words; what hyperparameters should I prioritize?

Topic: hyperparameter-tuning random-forest

Category: Data Science

GridSearch on imbalanced datasets

Valentin

2022年5月23日 06:06

Im trying to use gridsearch to find the best parameter for my model. Knowing that I have to implement nearmiss undersampling method while doing cross validation, should I fit my gridsearch on my undersampled dataset (no matter which under sampling techniques) or on my entire training data (whole dataset) before using cross validation?

Topic: hyperparameter-tuning imbalance scikit-learn machine-learning

Category: Data Science

MLP classifier Gridsearch CV parameters to tune?

Joseph Hodson

2022年5月23日 01:03

I'm looking to tune the parameters for sklearn's MLP classifier but don't know which to tune/how many options to give them? Example is learning rate. should i give it[.0001,.001,.01,.1,.2,.3]? or is that too many, too little etc.. i have no basis to know what is a good range for any of the parameters. Processing power is limited so i can't just test the full range. If anyone has a general guide of which are the most important to tune and …

Topic: mlp hyperparameter-tuning grid-search scikit-learn python

Category: Data Science

How are parameters selected in cross-validation?

NAS

2022年5月20日 16:20

Suppose I'm training a linear regression model using k-fold cross-validation. I'm training K times each time with a different training and test data set. So each time I train, I get different parameters (feature coefficients in the linear regression case). So I will have K parameters at the end of cross-validation. How do I arrive at the final parameters for my model? If I'm using it to tune hyperparameters as well, do I have to do another cross-validation after fixing …

Topic: hyperparameter-tuning training parameter-estimation cross-validation machine-learning

Category: Data Science

How to suppress "Estimator fit failed. The score on this train-test" warning message?

data science student

2022年5月10日 09:58

I am working on hyper-tuning random forest classifier with following parameters in random search CV In [100]: # defining model Model = RandomForestClassifier(random state=1) # Parameter grid to pass in RandomSearchCV param grid = { "n_estimators": [200,250,300], "min_samples_leaf": np.arange(1, 4), "max_features": [np.arange(0.3, 0.6, 0.1),'sqrt'],"max_samples": np.arange(0.4, 0.7, 0.1)} #Calling RandomizedSearchcV randomized_cv = RandomizedSearchCV(estimator=Model, param distributions=param grid, n_iter=10, n_jobs = -1, scoring=metrics.make_scorer(metrics.recall_score)) #Fitting parameters in RandomizedSearchcv randomized cv.fit(X train, y train) print ("Best parameters are {} with CV score={}:" .format (randomized_cv.best params_,randomized_cv.best_score_)) …

Topic: hyperparameter-tuning machine-learning-model hyperparameter random-forest machine-learning

Category: Data Science

binary classification pipeline to select threshold

lml

2022年5月8日 10:03

There are quite a few questions regarding the optimisation of binary threshold in a classification problem. However, I haven't found a single end-to-end solution to this problem. In an existing project, I have come up with the following pipeline to train a binary classifier: Outer-CV due to small to moderate data size. Inner-CV to tune hyperparameters Train model with tuned hyperparameters on outer-cv trainset Predict on the outer-cv test set Find optimal threshold using prediction probabilities Get score converting prediction …

Topic: hyperparameter-tuning cross-validation classification

Category: Data Science

How to improve regression neural network?

Darkstar Dream

2022年5月8日 08:02

I am new to deep learning and data science and trying to increase my knowledge by working on some hackathons. Currently, the hackathon project I am working on has the task to predict the closing price of crypto-currency based on 48 parameters with ~1200 records. By far I was able to achieve some good accuracy from the model but still, my score is very low. I have tried many things from knowledge but it doesn't seem to be affecting the …

Topic: hyperparameter-tuning regression deep-learning neural-network data-cleaning

Category: Data Science

How to choose max layers and units to search over in hyper parameter tuning

Socorro

2022年5月7日 20:51

When performing any hyper parameter tuning, let's say random search for simplicity, and I want to search over a minimum to max units/nodes in a layer, and a minimum to max number of layers, are there rules to guide what is a "large enough" number for my search? Currently all I know is "that should be good enough/large enough, let's search in there". I could be not searching a large enough space, or searching a space that's far too large …

Topic: hyperparameter-tuning hyperparameter deep-learning neural-network machine-learning

Category: Data Science

RandomizedSearchCV doesn't stop running

Fernanda

2022年5月6日 20:22

I'm trying to optimize the hyperparameters of my model using RandomizedSearchCV. However, it doesn't stop running even if I define few iterations. Someone could help me? The code I'm using is presented below: def build_classifier(optimizer, units, alpha, l1): model = tf.keras.Sequential() model.add(tf.keras.layers.LSTM(units, kernel_regularizer = regularizers.l1(l1 = l1), input_shape= (None, n_features), return_sequences = True)) model.add(tf.keras.layers.LSTM(units, kernel_regularizer = regularizers.l1(l1 = l1), return_sequences = True)) model.add(tf.keras.layers.LSTM(units, kernel_regularizer = regularizers.l1(l1 = l1), return_sequences = False)) model.add(tf.keras.layers.Dense(5)) model.compile(optimizer = optimizer, loss = 'mae') return model …

Topic: hyperparameter-tuning python

Category: Data Science

Benefits of using Deep Learning-specific hyperparameter optimization tools vs. sklearn?

Edgar Derby

2022年5月4日 10:07

There are quite a few library for hyperparameter optimization that are specific to Keras or other Deep Learning libraries, like Hyperas or Talos. My question is, what's the main benefit of using these libraries compared to, for example, sklearn.model_selection.GridSearchCV() or sklearn.model_selection.RandomizedSearchCV?

Topic: hyperparameter-tuning keras hyperparameter deep-learning python

Category: Data Science

What does a leaf size of 1 in K-neighbors regression mean?

Caterina

2022年5月3日 17:51

I am doing hyperparameter tuning + cross validation and I'm constantly getting that the optimal size of the leaf should be 1. Should I worry? Is this a sign of overfitting?

Topic: k-nn hyperparameter-tuning cross-validation scikit-learn

Category: Data Science

When using optuna I should return accuracy or loss as objective value?

Pratichhya

2022年5月3日 09:45

I am using optuna for hyperparameter tuning for my segmentation model. At the model, I am returning accuracy as an objective value since I realised that it tries to optimize to get the best result based on the objective value. I tried the same with returning (1-loss) but I am not sure what goes with either loss or accuracy when tuning. Also for loss is there another way than 1-loss to optimize or tune based on the loss curve?

Topic: hyperparameter-tuning optimization

Category: Data Science

My own model trained on the full data is better than the best_estimator I get from GridSearchCV with refit=True?

AIforAll

2022年4月27日 19:02

I am using an XGBoost model to classify some data. I have cv splits (train, val) and a separate test set that I never use until the end. I have used GridSearchCV to determine the best parameters and fed my cv splits (5 folds) into it as well as set refit=True so that once it figures out the best hyperparameters it trains on the full data (all folds as opposed to just 4/5 folds) and returns the best_estimator. I then …

Topic: hyperparameter-tuning data-science-model gridsearchcv xgboost

Category: Data Science

Is Loss value (e.g., MSE loss) used in the calculation for parameter update when doing gradient descent?

AZ123

2022年4月25日 06:57

My question is really simple. I know the theory behind gradient descent and parameter updates, what I really haven't found clarity on is that is the loss value (e.g., MSE value) used, i.e., multiplied at the start when we do the backpropagation for gradient descent (e.g., multiplying MSE loss value with 1 then doing backprop, as at the start of backprop we start with the value 1, i.e., derivative of x w.r.t x is 1)? If loss value isn't used …

Topic: hyperparameter-tuning backpropagation loss-function neural-network

Category: Data Science

Efficient Searching for a basis of information as a hyperparameter in a large possible hyperparameter space

wigeon

2022年4月18日 15:37

I have a set of inputs, let's call them 'I', that can be fed through a complicated group of functions to produce/calculate a wide variety of outputs (let's call them 'O'). I want to find a subset of outputs (let's call them 'O-prime') within 'O' that contain sufficient information to form a basis in order to find/reconstruct a point in the 'I'-space accurately. In other words I want to pick 'O-prime' such that I am able to uniquely identify any …

Topic: hyperparameter-tuning mathematics functions hyperparameter neural-network

Category: Data Science

About