Multiple values for a single parameter in the mlflow run command
How to pass multiple values to each parameter in the mlflow run command?
The objective is to pass a dictionary to GridSearchCV as a param_grid to perform cross validation.
In my main code, I retrieve the command line parameters using argparse. And by adding nargs='+' in the add_argument(), I can write spaced values for each hyper parameter and then applying vars() to create the dictionary. See code below:
import argparse
# Build the parameters for the command-line
param_names = list(RandomForestClassifier().get_params().keys())
# Param types in the same order they appear in param_names by using get_params()
param_types = [bool, float, dict, str, int, float, int, float, float, float,
float, float, float, int, int, bool, int, int, bool]
# Allow for only optional command-line arguments
parser = argparse.ArgumentParser()
grid_group = parser.add_argument_group('param_grid_group')
for i, p in enumerate(param_names):
grid_group.add_argument(f'--{p}', type=param_types[i], nargs='+')
#Create a param_grid to be passed to GridSearchCV
param_grid_unprocessed = vars(parser.parse_args())
This works well with the classic python command :
python my_code.py --max_depth 2 3 4 --n_estimators 400 600 1000
As I said, here I can just write spaced values for each hyper-parameter and the code above does the magic by grouping the values inside a list and returning the dictionary below that I can then pass to GridSearchCV :
{'max_depth':[2, 3, 4], 'n_estimators':[400, 600, 1000]}
However with the mlflow run command, I can't get it right so far as it only accepts one value for each parameter. Here's my MLproject file :
name: mlflow_project
conda_env: conda.yml
entry_points:
main:
parameters:
max_depth: int
n_estimators: int
command: python my_code.py --max_depth {max_depth} --n_estimators {n_estimators}
So this works :
mlflow run . -P max_depth=2 -P n_estimators=400
But not this :
mlflow run . -P max_depth=[2, 3, 4] -P n_estimators=[400, 600, 1000]
In the documentation, it seems that it's impossible to do it. So, is there is any hack to overcome this problem ?
Topic mlflow gridsearchcv python
Category Data Science