ranking

Computing probabilities in Plackett-Luce model

SHASHANK GUPTA

2022年6月4日 14:19

I am trying to implement a Plackett-Luce model for learning to rank from click data. Specifically, I am following the paper: Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank. The objective function is the reward function similar to the one used in reinforcement learning $R_d$ is the reward for document d, $\pi(k \vert d)$ is the probability of document d placed at position k for a given query q. $w_k$ is the weight of position …

Topic: bayesian ranking reinforcement-learning deep-learning

Category: Data Science

Differentiable loss function for ranking problem in regression model

JunjieChen

2022年5月24日 19:07

In regression problem, we may need a loss function to measure the relative ranking accuracy between targets $y$ and predicted values $y_{pred}$. Abviously, the simple MSE does not consider such ranking relations. A straight choice is the so-called IC (Information Coefficient) $$IC\propto corr(\text{rank}(y), \text{rank}(y_{pred}))$$ which use the correlation between two ranks. However, rank function is not differentiable, thus it can't be used in loss function for regression which uses gradient propagation to update the parameters. Another choice might be a …

Topic: loss-function ranking regression machine-learning

Category: Data Science

What can help decrease outliers' influence on non-tree models?

Revolucion for Monica

2022年5月20日 03:05

I have a feature with all the values between 0 and 1 except few outliers larger than 1. I am trying to collect all the methods that can help to decrease outliers' influence on non-tree models: StandardScaler Apply rank transform to the features Apply np.log1p(x) transform to the data MinMaxScaler Winsorization I wasn't able to imagine any other ... I guess that's all?

Topic: preprocessing ranking outlier

Category: Data Science

What is the difference between nDCG and rank correlation methods?

TyanTowers

2022年5月17日 15:03

When do we use one or the other? My use case: I want to evaluate a linear space to see how good retrieval results are. I have a set of data X (m x n) and some weights W (m x 1). I want to measure the nearest neighbour retrieval performance on W'X with a ground truth value Y. This is a continuous value, so I can't use simple precision/recall. If I use rank correlation, I will find the correlation …

Topic: ndcg learning-to-rank ranking

Category: Data Science

Methods for ensembling ranked lists?

user3490622

2022年5月16日 14:03

I was wondering if there's a good way to use ensembling when I have two or more algoritims producing ranked lists. That is, suppose I have the following datasets consisting of ordered lists (higher to the top means more relevant): Method1_Rankings Method2_Rankings GoldStandard_Rankings item1 item2 item1 item3 item1 item3 item2 item10 item5 ... Is there a way to optimally combine methods 1 and 2 (e.g., give the rankings some weights or similar)? Thank you.

Topic: ensemble ranking

Category: Data Science

Best way to narrow down a list and rank based on attributes?

stardust123

2022年5月11日 14:05

I have a mortgage/credit data set that contains a list of customers (600k rows) and has a 100 columns inclusive of the customer's general info (address, city, zipcode, etc), income, fico scores, number of current mortgages, mortgages in the past, aggregate mortgage amounts, number of bank card trades, etc. The data pertains to customers that are already good candidates to contact for issuing a credit product, however if one is to narrow down the list to 350K: What would be …

Topic: ranking classification machine-learning

Category: Data Science

How to determine the "total number of relevant documents" in calculatiion of Recall in Precision and Recall if it's not known? Can it be estimated?

Banik

2022年4月28日 15:17

On Wikipedia there is a practical example of calculating Precision and Recall: When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are. I absolutely don't understand how one can use the Precision and Recall in real/life scenario of total number …

Topic: learning-to-rank ranking evaluation information-retrieval recommender-system

Category: Data Science

Improve results using user input

Martin

2022年4月26日 03:03

I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input) n-result are retrieved but the closest expressions are not necessarily the most relevant one. For example, by typing : hospital machine The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine" A user will …

Topic: word-embeddings ranking machine-learning

Category: Data Science

Ranking algorithm based on a handful of features

kms

2022年4月11日 00:03

I am trying to determine the apt algorithm for a ranking problem that I am working on. I have social media metrics - engagement, sentiment, audience size etc for several brands and am looking for a ranking / classification algorithm to rank them. I am not sure if I have a dependent variable or label class for classical classification algorithms. The data is aggregated by brands and algorithms needs to rank the brands based on the metrics. Any ideas would …

Topic: ranking scoring classification machine-learning

Category: Data Science

Why is NDCG high even for wrongly ranked predictions?

Michael

2022年4月8日 18:19

The NDCG (Normalized Discounted Cumulative Gain) metric for ranking is defined as DCG/IDCG, where IDCG is the ideal DCG and is said to take values in [0, 1]. However, since the DCG will always be positive for any (positive) predicted scores, this metric will never be 0 and it seems to me that it is very biased towards high values. So much so that, in my experiments, I get a NDCG of ~0.8 out of 1.0 for a (custom-made) prediction …

Topic: ndcg metric ranking

Category: Data Science

Drastic drop in Somers' D ? Why?

CoolJohnTo

2022年4月6日 20:03

I came across to find the correlation between the ratings assigned by two coaches to a same group of 40 players. I have tabulated the results as below: The Somers' D is 50%. However, for the case below, The Somers' D is 94.7%. My question is, why both scenarios are having 2 deviations but the first scenario has so much lower Somers' D compared to the second scenario?

Topic: interpretation descriptive-statistics ranking cross-validation

Category: Data Science

Learning to Rank with Unlabelled Dataset

amber

2022年3月28日 22:57

I have folder of about 60k PDF documents that I would like to learn to rank based on queries to surface the most relevant results. The goal is to surface and rank relevant documents, very much like a search engine. I understand that Learning to Rank is a supervised algorithm that requires features generated based on query-document pairs. However, the problem is that none of them are labelled. How many queries should I have to even begin training the model?

Topic: learning-to-rank search-engine xgboost ranking nlp

Category: Data Science

How to ensamble different ranking models?

Diego Palacios

2022年3月28日 21:27

I have trained two different models, which give a score to each data point. The score of the models it is not necessarily comparable. The score is used to give a ranking, and the performance is measured with AUC and ROC curve. How can I ensamble the different models to obtain a better AUC and ROC curve?

Topic: auc roc ranking classification

Category: Data Science

Na values in rankings

Mir

2022年3月24日 14:47

Having a table of rankings containing many Na values, how should I deal with Na values while calculating the correlation between those rankings?

Topic: ranking correlation

Category: Data Science

Is it Possible to Use Machine Learning for Ranking Alternatives?

sahar

2022年3月20日 14:01

Right now, I’m working on road and street safety analysis. I have a dataset of dangerous points in four regions of a city. Some of the available variables are road lighting status, ITS, latitude, longitude, longitudinal protection status, type of point control, as well as the number of injuries and deaths for the past two years. Also, there are some dummy variables(road lighting status, ITS, longitudinal protection status, type of point control). I want to know whether I could rank …

Topic: ranking machine-learning

Category: Data Science

Feature selection with information gain (KL divergence) and mutual information yields different results

AutoMiner

2022年3月16日 19:01

I'm comparing different techniques for feature selection / feature ranking. Two of the techniques under scrutiny are the mutual information (MI) and the information gain (IG) as used in decision trees, i.e. the Kullback-Leibler divergence. My data (class and features) is all binary. All sources I could find state, that MI and IG are basically "two sides of the same coin", i.e. that one can be tranformed into the oher via mathematical manipulation. (For example [source 1, source 2]) Yet, …

Topic: mutual-information information-theory ranking feature-selection

Category: Data Science

NDCG score is greater than 1

iprof0214

2022年2月24日 17:08

I'm solving a problem of ranking classes for each unique id based on the utilization quantity. I have 6 unique classes in the training and test data. My neural net mode predicts the utilization coressponding to each class. So if there are 10000 test samples, I have 10000X6 prediction array and 10000X6 true value array. I want to validate the model performance using NDCG metrics. I followed https://www.kaggle.com/davidgasquez/ndcg-scorer to compute NDCG. In there, the shapes for the parameters are as …

Topic: ndcg learning-to-rank ranking evaluation neural-network

Category: Data Science

Can we apply multi-criteria decision making algorithms in incomplete data?

Nina

2022年1月28日 18:26

I am currently working on a project where a multi criteria decision making algorithm is needed in order to evaluate several alternatives for a given goal. After long research, I decided to use the AHP method for my case study. The problem is that the alternatives taken into account for the given goal contain incomplete data. For example, I am interested in buying a house and I have three alternatives to consider. One criterion for comparing them is the size …

Topic: ranking

Category: Data Science

How to match people in preference ranked survey results?

Code_Pig

2022年1月18日 23:39

I'm sending out a survey that I want to use to create pairs. For example, each person indicates whether they want to be a mentor, or a mentee. They then stack rank 10 topics that they're interested in either mentoring on or being mentored on, respectively. My question is, given a list of mentors and mentees, how can I effectively calculate the most similar pairings?

Topic: data-analysis ranking

Category: Data Science

Calculating statistical ranks between datasets with unpaired observations

Steve

2021年12月22日 23:50

The problem is the following: I have multiple datasets for which I want to calculate a ranking for each. All observations contained in the datasets can be arbitrarily permuted, so they are unpaired, to speak in the words of statisticians. Example datasets are: dataset1 = [0.6487500071525574, 0.6499999761581421, 0.6412500143051147, 0.6662499904632568, 0.6225000023841858, 0.6324999928474426, 0.637499988079071, 0.6287500262260437, 0.6412500143051147, 0.6212499737739563] dataset2 = [0.6075000166893005, 0.6287500262260437, 0.6312500238418579, 0.6162499785423279, 0.6012499928474426, 0.6150000095367432, 0.6387500166893005, 0.6200000047683716, 0.5950000286102295, 0.5849999785423279] dataset3 =[0.6237499713897705, 0.612500011920929, 0.6075000166893005, 0.6162499785423279, 0.6187499761581421, 0.6287500262260437, 0.6200000047683716, 0.6237499713897705, 0.5824999809265137, 0.5787500143051147] I understand …

Topic: ranking dataset statistics

Category: Data Science

About