How fit pairwise ranking models in XGBoost?

Question

How fit pairwise ranking models in XGBoost?

tokestermw

2021年7月30日 17:11

As far as I know, to train learning to rank models, you need to have three things in the dataset:

label or relevance
group or query id
feature vector

For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and features).

1 qid:10 1:0.031310 2:0.666667 ...
0 qid:10 1:0.078682 2:0.166667 ...

I am trying out XGBoost that utilizes GBMs to do pairwise ranking. They have an example for a ranking task that uses the C++ program to learn on the Microsoft dataset like above.

However, I am using their Python wrapper and cannot seem to find where I can input the group id (qid above). I can train the model using just the features and relevance score, but I am missing something.

Here is a sample script.

gbm = XGBRegressor(objective=rank:pairwise)

X =  np.random.normal(0, 1, 1000).reshape(100, 10)
y = np.random.randint(0, 5, 100)

gbm.fit(X, y) ### --- no group id needed???

print gbm.predict(X)

# should be in reverse order of relevance score
print y[gbm.predict_proba(X)[:, 1].argsort()][::-1]

Topic xgboost ranking gbm search

Category Data Science

amyrit · Accepted Answer · 2020年11月3日 14:36

1

amyrit answered at 2020年11月3日 14:36

According to the XGBoost documentation, XGboost expects:

the examples of a same group to be consecutive examples,
a list with the size of each group (which you can set with set_group method of DMatrix in Python).

bigdong · Accepted Answer · 2017年12月1日 16:08

1

bigdong answered at 2017年12月1日 16:08

set_group is very important to ranking, because only the scores in one group are comparable. You can sort data according to their scores in their own group.

For easy ranking, you can use my xgboostExtension.

How fit pairwise ranking models in XGBoost?

About