Multiple values for a single parameter in the mlflow run command

How to pass multiple values to each parameter in the mlflow run command?

The objective is to pass a dictionary to GridSearchCV as a param_grid to perform cross validation.

In my main code, I retrieve the command line parameters using argparse. And by adding nargs='+' in the add_argument(), I can write spaced values for each hyper parameter and then applying vars() to create the dictionary. See code below:

import argparse

# Build the parameters for the command-line
param_names = list(RandomForestClassifier().get_params().keys())

# Param types in the same order they appear in param_names by using get_params()
param_types = [bool, float, dict, str, int, float, int, float, float, float,
               float, float, float, int, int, bool, int, int, bool]

# Allow for only optional command-line arguments
parser = argparse.ArgumentParser()
grid_group = parser.add_argument_group('param_grid_group')
for i, p in enumerate(param_names):
    grid_group.add_argument(f'--{p}', type=param_types[i], nargs='+')
#Create a param_grid to be passed to GridSearchCV
param_grid_unprocessed = vars(parser.parse_args())

This works well with the classic python command :

python my_code.py --max_depth 2 3 4 --n_estimators 400 600 1000

As I said, here I can just write spaced values for each hyper-parameter and the code above does the magic by grouping the values inside a list and returning the dictionary below that I can then pass to GridSearchCV :

{'max_depth':[2, 3, 4], 'n_estimators':[400, 600, 1000]}

However with the mlflow run command, I can't get it right so far as it only accepts one value for each parameter. Here's my MLproject file :

name: mlflow_project

conda_env: conda.yml

entry_points:

  main:
    parameters:
      max_depth: int
      n_estimators: int
    command: python my_code.py --max_depth {max_depth} --n_estimators {n_estimators}

So this works :

mlflow run . -P max_depth=2 -P n_estimators=400

But not this :

 mlflow run . -P max_depth=[2, 3, 4] -P n_estimators=[400, 600, 1000]

In the documentation, it seems that it's impossible to do it. So, is there is any hack to overcome this problem ?

Topic mlflow gridsearchcv python

Category Data Science


One option would be to switch from argparse to fire. Fire is a third-party Python package for building command-line interfaces (CLI). Fire has better support for grouping commands.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.