Multilabel classification for a learning to rank application
I am looking for some suggestions on Learning to Rank method for search engines. I created a dataset with the following data:
query_dependent_score, independent_score, (query_dependent_score*independent_score), classification_label
query_dependent_score is the TF-IDF score i.e. similarity b/w query and a document.
independent_score is the viewing time of the document.
There are going to be 3 classes:
- 0 (not relevant),
- 1 (kind of relevant),
- 2 (most relevant)
I have a total of 750 queries and I collected top 10 results of each, so I have a total of 7500 data points.
I have been thinking of estimating a relevance function like:
w0 + w1*query_dependent_score + w2*independent_score + w3*(query_dependent_score*independent_score)
I can clearly see this is like a classification problem but I wanted some info on whether this is right way to approach this problem.
I referred to Machine learning technique to calculate weighted average weights? for some ideas.
Following is the code that I have written:
from sklearn.linear_model import LogisticRegression
import numpy as np
DATASET_PATH = "..."
search_data = np.genfromtxt(DATASET_PATH, delimiter=',', skip_header=1, usecols=(1, 2, 3, 4))
document_grades = search_data[:, 3:4]
document_signals = search_data[:, :3]  # This has 3 features.
total_rows = np.shape(search_data)[0]
split_point = int(total_rows * 0.8)
training_data_X, test_data_X = document_signals[:split_point, :], document_signals[split_point:, :]
training_data_y, test_data_y = document_grades[:split_point, :], document_grades[split_point:, :]
clf = LogisticRegression(multi_class="multinomial", solver="lbfgs")
clf.fit(X=training_data_X, y=training_data_y.ravel())
print(clf.classes_)  # [0, 1, 2]
print(clf.coef_)  # This is a 3 x 3 matrix?
print(clf.intercept_)  # An array of 3 elements?
Based on the sklearn's documentation coef_ should give me the values of w1, w2 and w3, and intercept_ should give me the value of w0.
But I have a matrix and an array for those weights. I am not sure how to get the values of the weights for the relevance function?
I am looking into learning to rank for the first time, so any suggestions are welcome.
Topic learning-to-rank scikit-learn
Category Data Science