Is there a quicker solution to Sklearn MAE?
I am attempting to run RandomForestRegressor
on this fairly large dataset:
df_train.describe():
Unnamed: 0 col1 col2 col3 col4 col5
count 8.886500e+05 888650.000000 888650.000000 888650.000000 888650.000000 888650.000000
mean 5.130409e+05 2.636784 3.845549 4.105381 1.554918 1.221922
std 2.998785e+05 2.296243 1.366518 3.285802 1.375791 1.233717
min 4.000000e+00 1.010000 1.010000 1.010000 0.000000 0.000000
25% 2.484332e+05 1.660000 3.230000 2.390000 1.000000 0.000000
50% 5.233705e+05 2.110000 3.480000 3.210000 1.000000 1.000000
75% 7.692788e+05 2.740000 3.950000 4.670000 2.000000 2.000000
max 1.097490e+06 90.580000 43.420000 99.250000 22.000000 24.000000
df_test.describe():
Unnamed: 0 col1 col2 col3 col4 col5
count 390.000000 390.000000 390.000000 390.000000 0.0 0.0
mean 194.500000 3.393359 4.016821 3.761385 NaN NaN
std 112.727548 4.504227 1.720292 3.479109 NaN NaN
min 0.000000 1.020000 2.320000 1.020000 NaN NaN
25% 97.250000 1.792500 3.272500 2.220000 NaN NaN
50% 194.500000 2.270000 3.555000 3.055000 NaN NaN
75% 291.750000 3.172500 4.060000 4.217500 NaN NaN
max 389.000000 50.000000 18.200000 51.000000 NaN NaN
While the code runs quickly for MSE
which is default for RandomForestRegressor
: 21 minutes approximately
However, when I switch to MAE
, it takes literally forever (ran my system for 3 days straight still no end in sight)
Is there any way to get MAE
to run faster with RandomForestRegressor?
I am running a Ryzen 3700X 8 Core, 32GB RAM machine.
Topic mse random-forest scikit-learn python
Category Data Science