Why does it not need to set test group when using 'rank:pairwise' in xgboost?
I'm new for learning-to-rank. I'm trying to learn the Learning to rank example provided by xgboost. I found that the core code is as follows in rank.py
.
train_dmatrix = DMatrix(x_train, y_train)
valid_dmatrix = DMatrix(x_valid, y_valid)
test_dmatrix = DMatrix(x_test)
train_dmatrix.set_group(group_train)
valid_dmatrix.set_group(group_valid)
params = {'objective': 'rank:pairwise', 'eta': 0.1, 'gamma': 1.0,
'min_child_weight': 0.1, 'max_depth': 6}
xgb_model = xgb.train(params, train_dmatrix, num_boost_round=4,
evals=[(valid_dmatrix, 'validation')])
pred = xgb_model.predict(test_dmatrix)
Group data is used in both training and validation sets. But test set prediction does not use group data. I also looked at some explanations to introduce model output such as What is the output of XGboost using 'rank:pairwise'?.
Actually, in Learning to Rank field, we are trying to predict the relative score for each document to a specific query.
My understanding is that if the test set does not have group data, no query is specified. How does the model output the relative score to the specified query?
And I've tried adding test_dmatrix.set_group(group_test)
. The output results of the two methods are in good agreement like:
[ 1.3535978 -2.9462705 0.86084974 ... -0.23594362 0.712791
-1.633297 ]
So my question as follows:
Why does it not need to set test group when using 'rank:pairwise' in xgboost?
How can I get label to the specified group query based on the forecasting score results?
Can anybody explain it to me? Thanks in advance.
Topic learning-to-rank xgboost python machine-learning
Category Data Science