Is this XGBoost model tending to overfit?

Here is the list of hyperparameters that I used: params = { 'scale_pos_weight': [1.0], 'eta': [0.05, 0.1, 0.15, 0.9, 1.0], 'max_depth': [1, 2, 6, 10, 15, 20], 'gamma': [0.0, 0.4, 0.5, 0.7] } The dataset is imbalanced so I used scale_pos_weight parameter. After 5 fold cross validation the f1 score that I got is: 0.530726530426833
Category: Data Science

Hyper-parameter tuning of NaiveBayes Classier

I'm fairly new to machine learning and I'm aware of the concept of hyper-parameters tuning of classifiers, and I've come across a couple of examples of this technique. However, I'm trying to use NaiveBayes Classifier of sklearn for a task but I'm not sure about the values of the parameters that I should try. What I want is something like this, but for GaussianNB() classifier and not SVM: from sklearn.model_selection import GridSearchCV C=[0.05,0.1,0.2,0.3,0.25,0.4,0.5,0.6,0.7,0.8,0.9,1] gamma=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] kernel=['rbf','linear'] hyper={'kernel':kernel,'C':C,'gamma':gamma} gd=GridSearchCV(estimator=svm.SVC(),param_grid=hyper,verbose=True) gd.fit(X,Y) print(gd.best_score_) print(gd.best_estimator_) …
Category: Data Science

Using Transaction Amount to Guide Learning in an Fraud Detection Machine Learning Model

I am currently using transaction amount as a feature in an XGBoost classification model designed to identify fraudulent transactions. Furthermore, transaction amount is bounded for this problem between 0 and 500. Using transaction amount as a feature does improve target class separability. However, I can't help but wonder if there is a better way to use this variable. To explain, I care more about getting the high transaction amount values correct than I do the low ones. However, the model …
Category: Data Science

The accuracy depends on the hyper-parameter in a strongly non-monotinic way

I have a data set labelled with a binary classes. I calculated the principal components from the data, then made the PC transformation. The goal is to find an optimal number of PCs so that the binary classification accuracy is good enough. I've learned a binary classifier sklearn.linear_model.LogisticRegressionCV (default parameters) on the PC-transformed data. The number of PCs was the (hyper-)parameter and it was varied. I cannot interpret the resulting Accuracy v. #PCs graph, why is it so strange? For …
Category: Data Science

Why does hyperparameter tuning occur on validation dataset and not at the very beginning?

Despite doing/using it a few times, I'm still slightly confused by the use of a validation set for hyper parameter tuning. As far as I can tell, I choose a model, train it on training data, assess performance on training data, then do hyper parameter tuning assessing model performance on validation data, then choose the best model and test this on test data. In order to do this, I basically need to pick a model at random for training data. …
Category: Data Science

How to suppress "Estimator fit failed. The score on this train-test" warning message?

I am working on hyper-tuning random forest classifier with following parameters in random search CV In [100]: # defining model Model = RandomForestClassifier(random state=1) # Parameter grid to pass in RandomSearchCV param grid = { "n_estimators": [200,250,300], "min_samples_leaf": np.arange(1, 4), "max_features": [np.arange(0.3, 0.6, 0.1),'sqrt'],"max_samples": np.arange(0.4, 0.7, 0.1)} #Calling RandomizedSearchcV randomized_cv = RandomizedSearchCV(estimator=Model, param distributions=param grid, n_iter=10, n_jobs = -1, scoring=metrics.make_scorer(metrics.recall_score)) #Fitting parameters in RandomizedSearchcv randomized cv.fit(X train, y train) print ("Best parameters are {} with CV score={}:" .format (randomized_cv.best params_,randomized_cv.best_score_)) …
Category: Data Science

How to choose max layers and units to search over in hyper parameter tuning

When performing any hyper parameter tuning, let's say random search for simplicity, and I want to search over a minimum to max units/nodes in a layer, and a minimum to max number of layers, are there rules to guide what is a "large enough" number for my search? Currently all I know is "that should be good enough/large enough, let's search in there". I could be not searching a large enough space, or searching a space that's far too large …
Category: Data Science

Benefits of using Deep Learning-specific hyperparameter optimization tools vs. sklearn?

There are quite a few library for hyperparameter optimization that are specific to Keras or other Deep Learning libraries, like Hyperas or Talos. My question is, what's the main benefit of using these libraries compared to, for example, sklearn.model_selection.GridSearchCV() or sklearn.model_selection.RandomizedSearchCV?
Category: Data Science

Why is my loss so high?

I am struggling to understand why I am getting such a high loss/val_loss rate on my training. I am training a regression network. I've normalized the input data to range between -1 to 1, and left the output data unaltered, its range is approximately between -100 and 100. I chose to normalize the input so that I could use tanh as the activation function since it outputs within this range. The neural network consists of 3 layers. print "Model definition!" …
Category: Data Science

Efficient Searching for a basis of information as a hyperparameter in a large possible hyperparameter space

I have a set of inputs, let's call them 'I', that can be fed through a complicated group of functions to produce/calculate a wide variety of outputs (let's call them 'O'). I want to find a subset of outputs (let's call them 'O-prime') within 'O' that contain sufficient information to form a basis in order to find/reconstruct a point in the 'I'-space accurately. In other words I want to pick 'O-prime' such that I am able to uniquely identify any …
Category: Data Science

Rules, rules of thumb, intuitions, on how to set up the best possible hyperparameter search

When I set up my neural networks, I really have very little idea what I'm doing in advance. It may just be a bit of educated guesswork as to "it may need a few layers only" or "this activation function could be useful for this type of problem". This type of thinking could be quite useful, but it could also be leading me astray in developing a loose framework that's not quite suitable. I may be thinking do a hyper …
Category: Data Science

Estimating Length of Hyperband Trials in Advance

I would like to use the (Keras/Tensorflow) hyperband tuning algorithm more than the Keras random search, for instance, when testing hyperparameters. With random search I can set max trials and get a really rough guess of how long it will go on (probably by an order of magnitude uncertainty from max_trials*epochs). With hyperband I don't know how long it will take, or if I'm setting a search that's going to be really limited. Is there a way to make sense …
Category: Data Science

How to tune parameters for Time Series Analysis, when forecasting is only dominated by one feature and error is not getting reduced?

I am trying to predict time series based on 150 features. When I plot correlation of these features, I am getting 20 features with more or less importance but every model I use, it is completely dominated by only one feature which is competently in sync with predicted output but not actual output . Please refer to the image below. The green line is prediction which is completely in sync with one of the feature.And for every valley in actual …
Category: Data Science

How can I tune LSTM hyperparameters?

If anyone is there to answer these, that'll be great. I'm in the midst of a Final Year Project on LSTM. Currently, I’m stuck and confused over LSTM codes. There are 4 hyperparameters that I can play around with: Look back Batch size LSTM units No. of Epochs Can you explain what will happen to my results if I tune each of these hyperparameters? And also is it common if we get different results each time we run the codes?
Category: Data Science

Activation Function Hyperparameter Optimisation

If I have a model, say: def build_model(self, hp): model = Sequential() model.add(Dense(hp.Choice('units', [12,16,20,24]), hp.Choice("activation", ["elu", "exponential", "gelu", "hard_sigmoid", "linear", "relu", "selu", "sigmoid", "softmax", "softplus", "softsign", "swish", "tanh"]))) model.add(Dense(4, hp.Choice("activation", ["elu", "exponential", "gelu", "hard_sigmoid", "linear", "relu", "selu", "sigmoid", "softmax", "softplus", "softsign", "swish", "tanh"]))) optimizer=tf.keras.optimizers.SGD(learning_rate=1e-5) model.compile(loss='mse', optimizer=optimizer, metrics=['mse']) return model and I want to span the space where the activation functions change on each layer, I believe that hp.Choice will choose one, only, activation function, for the whole model each time …
Category: Data Science

Drop Out in Hyperparameter Optimisation

Is it correct to add dropout to each layer and that it is done as in the below example? class MyHyperModel(kt.HyperModel): def build_model(self, hp): model = Sequential() for i in range(hp.Int('dense_layers',1,4)): model.add(Dense(hp.Choice('units', choice_units), hp.Choice("activation", ["elu", "exponential", "relu"]))) **model.add(layers.Dropout(hp.Choice('rate',[0.0,0.05,0.10,0.15,0.25])))** model.add(Dense(1, hp.Choice("activation", ["elu", "relu"]))) optimizer=tf.keras.optimizers.SGD(hp.Float('learning_rate',min_value=1e-6, max_value=1e-3,default=1e-5)) model.compile(loss='mse', optimizer=optimizer, metrics=['mse']) return model I.e. after each Dense layer, by adding model.add(layers.Dropout(hp.Choice('rate',[0.0,0.05,0.10,0.15,0.25]))) it will add dropout to each new Dense layer. Is this true? And if I wanted to vary the choice of dropout layer …
Category: Data Science

Should hyperparameter optimisation focus on many trials (models) lower epochs first, then a second round with few models, many epochs?

Rather than a hyperparameter optimisation with kt.tuners.RandomSearch, say, that does (option A), say X model trials (e.g. 100), Y epochs each (say 100, so a total of 10,000 epochs across all models) where Y would be 'enough epochs per experiment to give good estimates for each model' in one whole experiment, would it be more appropriate to split the experiment into two parts (option B): run X*5 model trials (200) with Y/10 epochs each (say 25). (Thus we scan many …
Category: Data Science

Two questions on hyper-parameter tuning

Question 1: In the example of logistic regression, I often see the regularization constant and penalty methods being tuned by a grid search. However, it seems like there are a lot more options for tuning: classifier_os.get_params() gives: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, ... and many more! So my question is: Are these other parameters typically not worth tuning, or are they left out in examples for another reason? For example, I changed to solver='liblinear' and got sub-par …
Category: Data Science

Why do BERT classification do worse with longer sequence length?

I've been experimenting using transformer networks like BERT for some simple classification tasks. My tasks are binary assignment, the datasets are relatively balanced, and the corpus are abstracts from PUBMED. The median number of tokens from pre-processing is about 350 but I'm finding a strange result as I vary the sequence length. While using too few tokens hampers BERT in a predictable way, BERT doesn't do better with more tokens. It looks like the optimal number of tokens is about …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.