grid-search

Why does Light GBM model produce different results while testing?

HEMANTHKUMAR GADI

2022年6月4日 06:08

Using the Light GBM regressor, I have trained my data and, using Grid Search, I got the best parameters, but while testing with the best parameters I am getting different results each time, which means the model produces different results for each test iteration. I ran the lightgbm twice with the same parameters, but got different results in validation. I found the only random seed parameter to be baggingSeed. After fixing baggingSeed, the problem also occurred. Should I fix any …

Topic: lightgbm grid-search training gbm machine-learning

Category: Data Science

grid search - optimal weighting of classifiers

bartman99

2022年5月25日 20:01

I am using three different of the shelf classifiers. It's a three class classification task. I want to calculate the optimal weights (c1weight, c2weight, c3weight) for each classifier (real task more classifiers and also weights for each class). Maybe simple grid search approach or sklearn ensemble classifier could do that. vc = VotingClassifier(estimators=[('gbc',GradientBoostingClassifier()), ('rf',RandomForestClassifier()),('svc',SVC(probability=True))], voting='soft',n_jobs=-1) params = {'weights':[[1,2,3],[2,1,3],[3,2,1]]} grid_Search = GridSearchCV(param_grid = params, estimator=vc) grid_Search.fit(X_new,y) print(grid_Search.best_Score_) I don't understand how to implement this for the following code. def get_classification(text, c1weight, …

Topic: grid-search ensemble

Category: Data Science

MLP classifier Gridsearch CV parameters to tune?

Joseph Hodson

2022年5月23日 01:03

I'm looking to tune the parameters for sklearn's MLP classifier but don't know which to tune/how many options to give them? Example is learning rate. should i give it[.0001,.001,.01,.1,.2,.3]? or is that too many, too little etc.. i have no basis to know what is a good range for any of the parameters. Processing power is limited so i can't just test the full range. If anyone has a general guide of which are the most important to tune and …

Topic: mlp hyperparameter-tuning grid-search scikit-learn python

Category: Data Science

Random search grid not displaying scoring metric

Lucas Dresl

2022年5月17日 11:33

I want to do a grid search of some few hyperparameters through a XGBClassifier of a binary class, but whenever i run it the score value (roc_auc) is not being display. I read in other question that this can be related to some error in model training but i am not sure which one is in this case. My model training data X_train is a np.array of (X, 19) and my y_train is a numpy.ndarray of shape (X, ) which …

Topic: grid-search xgboost scikit-learn python

Category: Data Science

Optimizing decision threshold on model with oversampled/imbalanced data

rayven1lk

2022年5月5日 03:01

I'm working on developing a model with a highly imbalanced dataset (0.7% Minority class). To remedy the imbalance, I was going to oversample using algorithms from imbalanced-learn library. I had a workflow in mind which I wanted to share and get an opinion on if I'm heading in the right direction or maybe I missed something. Split Train/Test/Val Setup pipeline for GridSearch and optimize hyper-parameters (pipeline will only oversample training folds) Scoring metric will be AUC as training set is …

Topic: grid-search smote model-selection cross-validation

Category: Data Science

Unbalanced data set - how to optimize hyperparams via grid search?

Code Now

2022年5月4日 06:07

I would like to optimize the hyperparameters C and Gamma of an SVC by using grid search for an unbalanced data set. So far I have used class_weights='balanced' and selected the best hyperparameters based on the average of the f1-scores. However, the data set is very unbalanced, i.e. if I chose GridSearchCV with cv=10, then some minority classes are not represented in the validation data. I'm thinking of using SMOTE, but I see the problem here that I would have …

Topic: grid-search smote multiclass-classification class-imbalance scikit-learn

Category: Data Science

How to refit GridSearchCV on Multiclass problem

Deniz

2022年5月3日 18:05

I'm trying to use GridSearchCV for my Multiclass problem. For starters, wanted to test it on KNeighborsClassifier. First, here's the code where I define the function which uses GridSearchCV: from sklearn.model_selection import GridSearchCV from sklearn.model_selection import KFold def grid_search(estimator, parameters, X, y): scoring = ['accuracy', 'precision', 'recall'] kf = KFold(5) clf = GridSearchCV(estimator, parameters, cv=kf, scoring=scoring, refit="accuracy", n_jobs=-1) clf.fit(X, y) i = clf.best_index_ best_precision = clf.cv_results_['mean_test_precision'][i] best_recall = clf.cv_results_['mean_test_recall'][i] print('Best score (accuracy): {}'.format(clf.best_score_)) print('Mean precision: {}'.format(best_precision)) print('Mean recall: {}'.format(best_recall)) print('Best …

Topic: grid-search gridsearchcv scikit-learn python

Category: Data Science

What's the difference between GridSearchCrossValidation score and score on testset?

fabianod

2022年5月3日 09:08

I'm doing classification using python. I'm using the class GridSearchCV, this class has the attribute best_score_ defined as "Mean cross-validated score of the best_estimator". With this class i can also compute the score over the test set using score. Now, I understand the theoretical difference between the two values(one is computed in the cross validation, the other is computed on the test set), but how should I interpret them? For example, if in case 1 I get these values (respectively …

Topic: grid-search gridsearchcv keras classification python

Category: Data Science

XGBoost Log Loss different from GridSearchCV Log Loss

Sean O'Connor

2022年4月24日 08:03

I have a classification problem where I am trying to predict if the data returns a 1 or 0. So your classic binary classification. I have my set of data that I have split into the dependent variables (ones I am training on) and the independent variable (my target that I am predicting, either a 0 or 1). I am using log loss as the scoring metric for my model. Firstly, I am using the cv function in xgboost to …

Topic: grid-search xgboost ensemble-modeling classification machine-learning

Category: Data Science

How to choose the best hyper-parameter when it is directly influenced by the random_state?

Ahmed Jyad

2022年4月8日 08:06

While trying to evaluate my Ridge Regression model and using GridSearchCV to find the best parameter. I noticed that the best estimator changes every time I change the random_state in my KFold object (cv parameter). With this in mind how do I choose the most optimal hyper parameter to implement my model.

Topic: hyperparameter-tuning grid-search machine-learning

Category: Data Science

Voting classifier using grid search for Time Series

Shan Khan

2022年3月28日 02:02

I have three models: Arima Auto ARIMA Double Exponential Smoothing I would like to apply an ensemble method - a voting method and allow the classifier to learn weights for these three models. I have checked the votingclassifier present in scikit learn. It requires: fit(x,y) to run. Time series object that is present in series object don't have y. How do you apply a voting classifier and learn weights through grid search?

Topic: grid-search ensemble-modeling time-series python

Category: Data Science

plot gridsearch csv results how?

Maths12

2022年3月19日 08:07

how can i plot my results from gridsearch csv? clf = GridSearchCV(pipeline, parameters, cv=3,return_train_score=True) clf.fit(x, y) df = pd.DataFrame(clf.cv_results_) i'm trying to get a similar plot to what is here: https://matthewbilyeu.com/blog/2019-02-05/validation-curve-plot-from-gridsearchcv-results , but this uses the grid search object and i have tried and failed at trying to get the same using just the gridsearch df (from above). can anybody help in how i go about this?

Topic: grid-search plotting cross-validation visualization python

Category: Data Science

Query regarding surprising spike in accuracy of ML model

Apoorva

2022年3月18日 07:00

I implemented all the major ML models (Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree, Random Forest, Ada Boost & XGBoost) on my dataset. My stratified cross-validation scores are between 70% & 80%. When I implemented my models using grid search, my accuracies shot up & they lie between 90% & 95%. Is this drastic increase in accuracy abnormal & fishy? My GridSearch CV code for Logistic Regression--> from sklearn.datasets import make_blobs, make_classification from sklearn.model_selection import GridSearchCV scaled_inputs, targets = …

Topic: grid-search gridsearchcv cross-validation accuracy

Category: Data Science

GridSearch CV: Suitable scoring metrics for Imbalanced data sets

Peter

2022年3月12日 09:02

I am new to machine learning. This is my $1^{st}$ machine learning project and I am working on classification on an imbalanced dataset. There are also multi-classes in the target variable. I would like to know what is the most suitable metrics for scoring the performance in the GridSearchCV. I think roc_au is sometimes used for imbalanced dataset. But there are several ‘roc_auc’ ‘roc_auc_ovo’ ‘roc_auc_ovr’ Which should I use? Alternatively, precision-recall_auc is also used. But I can't seem to find …

Topic: grid-search class-imbalance

Category: Data Science

Gridsearch ValueError: Input contains infinity or a value too large for dtype('float64'). - Using Pipeline

Dove

2022年3月4日 04:02

Update: I have non NAN values so fillna is not an issue. Clean dataset. I'm having this error occur when I try to predict using my grid best params. I get a score when fit it onto the training data. I get this error however when I try and predict on the X_test. Very confused. I'm attempting to use a pipeline and gridsearch combined for my dataset. Code works up to the training part and score. It's a clean dataset …

Topic: grid-search data-science-model logistic-regression scikit-learn machine-learning

Category: Data Science

Determine model hyper-parameter values for grid search

randunu galhena

2022年3月3日 19:04

I built machine learning model for Ridge,lasso, elastic net and linear regression, for that I used gridsearch for the parameter tuning, i want to know how give value range for **params Ridge ** below code? example consider alpha parameter there i uses for alpha 1,0.1,0.01,0.001,0.0001,0 but i haven't idea how this values determine each models.(ridge/lasso/elastic) can some one explain these things? from sklearn.linear_model import Ridge ridge_reg = Ridge() from sklearn.model_selection import GridSearchCV params_Ridge = {'alpha': [1,0.1,0.01,0.001,0.0001,0] , "fit_intercept": [True, False], …

Topic: grid-search data-science-model regression python machine-learning

Category: Data Science

Brute-force feature selection and cross-validation

NotLost

2022年2月27日 00:24

There is an existing score made of 10 parameters; each parameter is equally weighted & the total score is found by summing the score for each parameter. I want to try to reduce the number of parameters in this score, but keep them equally weighted. I have data on 500 people with the score & two outcomes of interest. As the number of parameters are small, I started by doing a brute-force approach to look at all the possible combinations …

Topic: grid-search cross-validation feature-selection

Category: Data Science

Fashion MNIST: Is there an easy way to extract only 1% of the data to do a minimal gridsearch?

ilam engl

2021年12月9日 12:40

I am trying implement several models on the fashion-MNIST. I have imported the data according to the tf.keras tutorial: import tensorflow as tf from tensorflow import keras import sklearn import numpy as np f_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = f_mnist.load_data() class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] print(train_images) print(train_labels) >>(60000, 28, 28) >>(60000,) print(test_images) print(test_labels) >>(10000, 28, 28) >>(10000,) # Need to concatenate as GridsearchCV takes entire set in input all_images = …

Topic: grid-search keras scikit-learn machine-learning

Category: Data Science

How to loop through multiple lists/dict?

spectre

2021年11月23日 10:55

I have the following code which finds the best value of k parameter in the KNNImputer. Basically it is looping through the list of k_value and for each element, it is fitting the KNNImputer to the model and in the end appending the result to an empty dataframe. lire_model = LinearRegression() k_value = [1,3,5,7,9,11, 13, 15, 17, 19, 21] k_value_results = pd.DataFrame(columns = ['k', 'mse', 'rmse', 'mae', 'r2']) scoring_list = ['neg_mean_squared_error', 'neg_root_mean_squared_error', 'neg_mean_absolute_error', 'r2'] for s in k_value: imputer = …

Topic: hyperparameter-tuning grid-search python parallel

Category: Data Science

GridSeachCV not performing well on ML models

Parth Sharma

2021年11月12日 21:01

from sklearn.model_selection import GridSearchCV svm2=SVC() grid={ 'C': [0.1, 1, 10, 100, 1000], 'kernel': ['linear', 'poly', 'rbf', 'sigmoid'], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001] } svm_grid=GridSearchCV(estimator=svm2,param_grid=grid,cv=3,n_jobs=-1) svm_grid.fit(xtrain,ytrain) svm_grid.best_params_ OUTPUT {'C': 1, 'gamma': 1, 'kernel': 'rbf'} CODE svm_grid.score(xtrain,ytrain) 0.9884434814012278 svm_grid.score(xtest,ytest) 0.8513708513708513 My question is even after performing GridSearch why the model is still overfitting and how can I further increase the accuracy and combat overfitting . I am facing same issues with RandomForest in Gridsearch grid = { 'n_estimators': [10, 20, 40, …

Topic: grid-search overfitting decision-trees random-forest svm

Category: Data Science

About