How fit pairwise ranking models in XGBoost?
As far as I know, to train learning to rank models, you need to have three things in the dataset:
- label or relevance
- group or query id
- feature vector
For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and features).
1 qid:10 1:0.031310 2:0.666667 ...
0 qid:10 1:0.078682 2:0.166667 ...
I am trying out XGBoost that utilizes GBMs to do pairwise ranking. They have an example for a ranking task that uses the C++ program to learn on the Microsoft dataset like above.
However, I am using their Python wrapper and cannot seem to find where I can input the group id (qid
above). I can train the model using just the features and relevance score, but I am missing something.
Here is a sample script.
gbm = XGBRegressor(objective=rank:pairwise)
X = np.random.normal(0, 1, 1000).reshape(100, 10)
y = np.random.randint(0, 5, 100)
gbm.fit(X, y) ### --- no group id needed???
print gbm.predict(X)
# should be in reverse order of relevance score
print y[gbm.predict_proba(X)[:, 1].argsort()][::-1]
Topic xgboost ranking gbm search
Category Data Science