Parameters optimization algorithms in Weka

In Weka, I used the Grid and Random search parameters tuning algorithms but unfortunately, their performance (in terms of better prediction accuracy) is observed worst when we use the ML algorithms (Support Vector Regression, Linear Regression etc) without any optimization algorithms. I wonder how it is possible? I mean one algorithm (Grid or Random search) should perform better or worst when compared with each other but they have worst performance compared to without any parameter optimization algorithms. I even tried the hybrid of both in Weka with MultiSearch option, but even the hybrid of these two have worst performance. Kindly if someone could provide comments based on their experience in this regard.

Topic hyperparameter-tuning grid-search weka

Category Data Science


There are many possibilities, but I suspect the tuning algorithms are overfitting your model. The multisearch/gridsearch etc. algorithms are selecting a combination of hyperparameters that optimize a metric of your choosing such as AUC/F1/MCC or something similar. If you are optimizing on the training data, the tuning algorithm will select the model with highest training score, but this will probably be overfitted.

Without the tuning algorithm, you can, by chance, select hyperparameters that perform worse on the training data but better on the testing data. This is particularly true if you do not have many instances on which to train/test; if you do not have many training instances, overfitting is even more likely. Moreover, if your testing set is very small, luck becomes an even greater factor.

Your features can also contribute to overfitting. For example, suppose you have a million features but only 100 training instances. If the features deviate significantly per instance, your model can very selectively optimize on the training data, but this results in poor generalization. If you have many more features than training instances, you should utilize dimensionality reduction algorithms such as PCA to create a better search space.

To get the best results with tuning algorithms, you should partition your data into training, validation, and testing sets. The multisearch/gridsearch should then evaluate the models on the validation set so that you encourage generalization rather than overfitting to training data.

Also, your interval selection for the grid search is important. Suppose you are using an SVM with RBF kernel and grid searching the gamma and C parameters, your grid cells should probably deviate exponentially. An example 11x11 grid might be: $ C \in \{10^{-3}, 10^{-2},...,10^7\} $ and $ \gamma \in \{10^{-9},10^{-8},...,10\} $. If you want finer granularity, after creating a heatmap of scores for your grid, you can "zoom into" an area of high scores by performing another grid search with finer intervals of a smaller domain.


Supposing that you refer to the performance of tuning algorithms as the improvement in algorithm performance (accuracy, error, etc.). The effectiveness of parameter optimization (or tuning) is different for different algorithms. Many papers have discussed it, and Olson et al. have shown how much performance may vary for several algorithms. SVM and LR (those that you mentioned in your question) were not improved so much after tuning.

If you have bad performance in several algorithms you tried, the problem may not be in the tuning algorithms, but in your features-target set. I suggest you take a look at the Domingos paper, it's a nice read to build a successful model.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.