How to loop through multiple lists/dict?

Question

How to loop through multiple lists/dict?

spectre

2021年11月23日 10:55

I have the following code which finds the best value of k parameter in the KNNImputer. Basically it is looping through the list of k_value and for each element, it is fitting the KNNImputer to the model and in the end appending the result to an empty dataframe.

lire_model = LinearRegression()
k_value = [1,3,5,7,9,11, 13, 15, 17, 19, 21]
k_value_results = pd.DataFrame(columns = ['k', 'mse', 'rmse', 'mae', 'r2'])

scoring_list = ['neg_mean_squared_error', 'neg_root_mean_squared_error', 'neg_mean_absolute_error', 'r2']

for s in k_value:
    imputer = KNNImputer(n_neighbors = s)   
    
    imputer.fit(train_x1_num)
    train_x2 = pd.DataFrame(imputer.transform(train_x1_num), columns = train_x1_num.columns)
    test_x2 = pd.DataFrame(imputer.transform(test_x1_num), columns = test_x1_num.columns)
    
    enc = ce.CatBoostEncoder()
    enc.fit(train_x3, train_y)
    train_x4 = pd.DataFrame(enc.transform(train_x3), columns = train_x3.columns)
    test_x4 = pd.DataFrame(enc.transform(test_x3), columns = test_x3.columns)

    base_score = cross_validate(lire_model, train_x4, train_y, cv = 5, scoring = scoring_list, 
                                n_jobs = -1)
    
    row = {
            'k': s,
            'mse' : -1 * base_score['test_neg_mean_squared_error'].mean(),
            'rmse' : -1 * base_score['test_neg_root_mean_squared_error'].mean(), 
            'mae' : -1 * base_score['test_neg_mean_absolute_error'].mean(), 
            'r2' : base_score['test_r2'].mean()
          }
    
    k_value_results = k_value_results.append(row, ignore_index = True)

If I have more than 1 list through which I want to loop through and perform the same functionality as above code, how can I do that?

For example:-

list1 = [a, b, c, d]
list2 = [e, f, g]

I want to loop through both the lists and for each combination of parameters (total 4*3 =12 combinations) I want the results. Basically I want to GridSearch over multiple lists without using sklearns GridSearchCV function.

Any ideas?

Topic hyperparameter-tuning grid-search python parallel

Category Data Science

Oxbowerce · Accepted Answer · 2021年11月23日 10:55

You can use the ParameterGrid class from scikit-learn for this. This allows you to supply a dictionary where the values are lists with possible values for that specific key. You can iterate over this to get all possible combinations between the specific hyperparameters, see also the examples from the documentation page:

from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1, 2], 'b': [True, False]}
list(ParameterGrid(param_grid))
# [{'a': 1, 'b': True}, {'a': 1, 'b': False},
#  {'a': 2, 'b': True}, {'a': 2, 'b': False}]

How to loop through multiple lists/dict?

About