Randomforest code taking longer time every iteration

I have a prediction code that runs RandomForestRegressor and RandomForestClassifier.

I call the functions 9 times each respectively and it is optimised by GridSearchCV.

The first time it ran, it took around 2 Hrs, 20 mins to run and almost after every run cycle, the duration has been steadily increasing and it took 3Hrs 45 Mins today. I have run the code 20 times so far and every time, the duration increases slightly while there is no change in underlying training data and the size of testing data.

While I take care to clear cache every time I run the code, I am unsure why it takes an increased amount of time to run the same.

Well, the general question would be How can I optimise a code? But, I guess this would be specifically SkLearn? The rest of the codes dont observe the same behaviour and is specific to the prediction code.

For RandomForestRegressor:

param_grid_rf = {
    'max_features': ['auto', 'sqrt', 'log2'],
    # 'criterion': ['mse', 'mae'] #mae takes forever to run and mse is default
}
rf = RandomForestRegressor()
rf = GridSearchCV(estimator=rf, param_grid=param_grid_rf, n_jobs=-2)

For RandomForestClassifier:

param_grid_rc = {
    'max_features': ['auto', 'sqrt', 'log2'],
    'criterion': ['gini', 'entropy']
}
rc = RandomForestClassifier()
rc = GridSearchCV(estimator=rc, param_grid=param_grid_rc, n_jobs=-2)

I cannot post the code in its entirely hence this open ended question.

I am using Windows 10 and Pycharm as my environment.

Topic windows random-forest scikit-learn optimization python

Category Data Science


There are 2 possible reasons which I can guess.

1.) You have kept n_jobs = -2 which does not utilizes all your cores. Set the value -1 to speed up your search

2.) I had a similar question (to which the link is given below) where my Decision Tree was taking too long to execute. I was using mae as a metric. Afterwards I came across multiple articles stating that mae takes longer to calculate than mse. I changed my metric to mse and it took fraction of the time to execute so you might want to try that. link to the question is: Decision Tree taking too long to execute

GridSearchCV is notorious for being exhaustive and slow. I would suggest going for Optuna which is considerably faster than GridSearchCV.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.