How to determine the "total number of relevant documents" in calculatiion of Recall in Precision and Recall if it's not known? Can it be estimated?

Question

How to determine the "total number of relevant documents" in calculatiion of Recall in Precision and Recall if it's not known? Can it be estimated?

Banik

2022年4月28日 15:17

On Wikipedia there is a practical example of calculating Precision and Recall:

When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are.

I absolutely don't understand how one can use the Precision and Recall in real/life scenario of total number of relevant documents is needed.

For example, In my scenario, I have a set of about 9000 collected documents and I am creating a recommender system with several algorithms (like Tf-idf, Doc2Vec, LDA...). It has to recommend the TOP 20 most similar recommendations (articles) based on one selected article. Since I am not going to count the number of all relevant articles manually in 9000 documents for every recommender query, what is a relevant way to estimate the total number of relevant articles so that I can calculate Recall and then proceed to calculate Average Precision?

The only information I found about this problem are this lecture notes where they suggest to create pool of the result:

There are several ways of creating a pool of relevant records: one method is to use all the relevant records found from different searches, another is to manually scan several journals to identify a set of relevant papers.

But I'm trying to find more information on this method of pools elsewhere.

Common sense is telling me that this can be a valid approach: To take, say, 50 random documents and manually count the number of relevant documents in that random sample and estimate the total number of relevant documents from that. Can this be a valid approach? I imagine I could do this for a few recommendation results (although it would be a bit time consuming) or have some test users selected.

Topic learning-to-rank ranking evaluation information-retrieval recommender-system

Category Data Science

Banik · Accepted Answer · 2022年4月28日 15:17

I think the answer to my question are "at k" ("@k") variants of above mentdioned methods: precision@k, recall@k, precision@k etc. I need to set the threshold to let's say TOP 20 (k=20) examples and then evaluate the results of precision and recall (by hand myself or by test users decision who will decide whether the recommendation is relvant or irrelevant). I found good practical examples here for anyone interested in the same problem at queirozf.com.

For example:

Recall @8 = true_positives@8 / (true_positives@8) + (false_negatives@8))

How to determine the "total number of relevant documents" in calculatiion of Recall in Precision and Recall if it's not known? Can it be estimated?

About