What is the difference between nDCG and rank correlation methods?

When do we use one or the other? My use case: I want to evaluate a linear space to see how good retrieval results are. I have a set of data X (m x n) and some weights W (m x 1). I want to measure the nearest neighbour retrieval performance on W'X with a ground truth value Y. This is a continuous value, so I can't use simple precision/recall. If I use rank correlation, I will find the correlation …
Category: Data Science

How to determine the "total number of relevant documents" in calculatiion of Recall in Precision and Recall if it's not known? Can it be estimated?

On Wikipedia there is a practical example of calculating Precision and Recall: When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are. I absolutely don't understand how one can use the Precision and Recall in real/life scenario of total number …
Category: Data Science

Learning to Rank with Unlabelled Dataset

I have folder of about 60k PDF documents that I would like to learn to rank based on queries to surface the most relevant results. The goal is to surface and rank relevant documents, very much like a search engine. I understand that Learning to Rank is a supervised algorithm that requires features generated based on query-document pairs. However, the problem is that none of them are labelled. How many queries should I have to even begin training the model?
Category: Data Science

NDCG score is greater than 1

I'm solving a problem of ranking classes for each unique id based on the utilization quantity. I have 6 unique classes in the training and test data. My neural net mode predicts the utilization coressponding to each class. So if there are 10000 test samples, I have 10000X6 prediction array and 10000X6 true value array. I want to validate the model performance using NDCG metrics. I followed https://www.kaggle.com/davidgasquez/ndcg-scorer to compute NDCG. In there, the shapes for the parameters are as …
Category: Data Science

How to inference LTR (Learning-to-Rank) models?

I've recently started looking into LTR models such as RankNet and LambdaMart. In the instance of LambdaMart and the LETOR dataset, I believe the model accepts the following as training input: query_id (scalar), document_features (vector), relevance score (scalar). However, I don't see the query features anywhere in the input. I think the query_id is only used to partition the dataset. How would I inference this model for a query that wasn't present in the training data? Do I need a …
Category: Data Science

What are the simplest predictive ranking algorithms?

I want to apply a predictive ranking algorithm to a dataset, so I've been reading about Learning to Rank, RankNet and LambdaMART [1][2]. These methods use neural nets or trees as their core framework. I'm wondering if there are any simpler methodologies available i.e. the linear regression (OLS) for predictive ranking?
Category: Data Science

Which algorithm to use for a very simple ranking problem?

Currently, I have a dataset with 10 features that results in a ranking of 4 items, i.e. [1,2,3,4], [4,3,1,2], [3,2,4,1] or any $4!$ permutations that can arise from the ranking. What algorithms are there to train a model for this? Making it a categorical variable with 24 items seems like a silly way to do this. Another silly way seems to be to transform the data into something like <$x_1$, $item_1$, 4> <$x_1$, $item_2$, 3> <$x_1$, $item_3$, 1> <$x_1$, $item_4$, …
Category: Data Science

How can I use a validation set to tune the hyperparameters of an XGBClassifier?

I'm currently building a ranking model using an XGBClassifier. I have training, testing, and validation sets. I want to use the validation set to tune the hyperparameters of the XGBClassifier before using it to train a model. Is this the right idea? Currently, I'm getting a 56% accuracy score with the default XGBClassifier (or 51% accuracy if I run PCA before training the model, which greatly reduces the time it takes to train). If I try tuning the hyperparameters before …
Category: Data Science

Ranking problem and imbalanced dataset

I know about the problems that imbalanced dataset will cause when we are working on classification problems. And I know the solution for that including undersampling and oversampling. I have to work on a Ranking problem(Ranking hotels and evaluate based on NDCG50 score this link), and the dataset is extremely imbalanced. However, the example I saw on the internet use the dataset as it is and pass it to train_test_split without oversampling/undersampling. I am kind of confused if that is …
Category: Data Science

How to use ndcg metric for binary relevance

I am working on a ranking problem to predict the right single document based on the user query and use the NDCG metric to measure the model. Given the details : Queries ( Q ), Result Document ( D ), Relevance score. But the relevance score is a binary ( 0 or 1 ) i.e out of document lists, only one document is marked as relevance score =1. Data set example: query, docs,relevance { [1, doc2,0],[1, doc3,0],[1, doc4,0 ],[1, doc6,1],[1, …
Category: Data Science

Solving Feature Distribution variance between Training and Prediction for Ranking models

I am building a linear regression model to improve ranking of documents. And trying to identify problems due to which model performance estimates don't match actual impact One major problem is feature distribution for training and prediction. Eg: For training data, features distribution is computed on top of seen documents. But during online prediction, the model is applied to all matching documents. This introduces variance between what the model believed to be actual feature value distribution and has the potential …
Category: Data Science

Listwise learning to rank with negative sample relevance

Typical listwise learning to rank (L2R) algorithm tries to learn the rank of docs $\{x_i\}_{i=1}^m$ corresponding to a query $q$. If we use correlation efficient to label the relevance between docs and query, then the label $y_i\in[0, 1]$. The larger the $y_i$, the more relevant of the doc $x_i$ to $q$. Most L2R algorithms, such as approximated rank/NDCG and ListMLE, focus more on the ranking accuray of positve correlated doc (i.e. $y_i$ close 1) by giving larger weight in the …
Category: Data Science

Multilabel classification for a learning to rank application

I am looking for some suggestions on Learning to Rank method for search engines. I created a dataset with the following data: query_dependent_score, independent_score, (query_dependent_score*independent_score), classification_label query_dependent_score is the TF-IDF score i.e. similarity b/w query and a document. independent_score is the viewing time of the document. There are going to be 3 classes: 0 (not relevant), 1 (kind of relevant), 2 (most relevant) I have a total of 750 queries and I collected top 10 results of each, so I …
Category: Data Science

How to perform Learning to Rank for a small dataset

I am very interested in applying Learning to rank to my problem doamin. When I read through the literature of Learning to rank I noted that the data they have used for training include thousands of queries. However, in my problem domain I only have 6 use-cases (similar to 6 queries) where I would like to obtain a ranking function using machine learning. I know the data I have is very very small. So, my question is; Can apply learning …
Category: Data Science

Learning to rank: how is the label calculated?

I am studying learning to rank and not sure I understand how the train sample and final label (relevance score) is constructed. Lets assume we sell furniture online. We have logged customer's query, product customer bought, clicked. Example: User A searched for "red sofa", clicked on Pa(r=1), Pb(r=5), Pc(r=10) and bought Pc. User B searched for "red sofa", clicked on Pa(r=1), Pd(r=2), Pe(r=6) and bought Pd. User C searched for "blue chair", clicked on Pu(r=1), Po(r=2), Ps(r=6) and bought Pu. …
Category: Data Science

Why does it not need to set test group when using 'rank:pairwise' in xgboost?

I'm new for learning-to-rank. I'm trying to learn the Learning to rank example provided by xgboost. I found that the core code is as follows in rank.py. train_dmatrix = DMatrix(x_train, y_train) valid_dmatrix = DMatrix(x_valid, y_valid) test_dmatrix = DMatrix(x_test) train_dmatrix.set_group(group_train) valid_dmatrix.set_group(group_valid) params = {'objective': 'rank:pairwise', 'eta': 0.1, 'gamma': 1.0, 'min_child_weight': 0.1, 'max_depth': 6} xgb_model = xgb.train(params, train_dmatrix, num_boost_round=4, evals=[(valid_dmatrix, 'validation')]) pred = xgb_model.predict(test_dmatrix) Group data is used in both training and validation sets. But test set prediction does not use group …
Category: Data Science

Learning to Rank Application

If there's a website/app that sells products and my job is to determine the order/ranking in which the products should be displayed. For example : I click on restaurants and a list of restaurants pops up, I have to determine in what order the restaurants should be displayed. All the data like ratings, distance to the customer, profit to us, prices, CTR, total number of views etc is available. But how should I approaching this problem of rankings them in …
Category: Data Science

Two definitions of DCG measure

I wanted to check the definition of Discounted Cumulative Gain (DCG) measure in the original paper Jarvelin and it seems it differs from the one given in the later literature Wang. Originally, for $n$ documents ranked from $r = 1, \ldots, p$, the $\text{DCG}_p$ is defined as $$\text{DCG}_p = \sum\limits_{r=1}^{b} G_r + \sum\limits_{r=b}^{p}\frac{G_r}{\log_br},$$ where $G_i$ is the relevance (or gain) of the $i$-th document. So the measure depends on the logarithm base $b$. For ranks below $b$, i.e. $r<b$, gains …
Category: Data Science

Search Query Sample Size Determination for validation set

While designing a search system, which searches in N identifiable categories, how many search queries does one need in each category to validate the target metric (DCG) scores accurately (balanced variance and bias)? does this number depend on N or the corpus size or both? Please add any publications possible. I would also like the understand if effect size and and bayesian effective sample sizes play some role here. Given a set of search queries Q for retrieving documents from …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.