How to calculate NDCG in recommendation system

This is a question about NDCG, which is a recommendation evaluation metric.

The following are being used as evaluation indicators for recommendations.

$$DCG = r_1 + \sum\limits_{i=2}^{N}\frac{r_i}{log_2i}$$ $$nDCG = \frac{DCG}{DCG_{perfect}}$$

The code is as follows:

def dcg_score (y_true, y_score, k = 20, gains = "exponential"):
    """Discounted cumulative gain (DCG) at rank k
    Parameters
    ----------
    y_true: array-like, shape = [n_samples]
        Ground truth (true relevance labels).
    y_score: array-like, shape = [n_samples]
        Predicted scores.
    k: int
        Rank.
    gains: str
        Whether gains should be "exponential" (default) or "linear".
    Returns
    -------
    DCG @k: float
    """
    order = np.argsort (y_score) [::-1]
    y_true = np.take (y_true, order [: k])

    if gains == "exponential":
        gains = 2 ** y_true-1
    elif gains == "linear":
        gains = y_true
    else:
        raise ValueError ("Invalid gains option.")

    # highest rank is 1 so +2 instead of +1
    discounts = np.log2 (np.arange (len (y_true)) + 2)
    return np.sum (gains / discounts)

def ndcg_score (y_true, y_score, k = 20, gains = "exponential"):
    """Normalized discounted cumulative gain (NDCG) at rank k
    Parameters
    ----------
    y_true: array-like, shape = [n_samples]
        Ground truth (true relevance labels).
    y_score: array-like, shape = [n_samples]
        Predicted scores.
    k: int
        Rank.
    gains: str
        Whether gains should be "exponential" (default) or "linear".
    Returns
    -------
    NDCG @k: float
    """
    best = dcg_score (y_true, y_true, k, gains)
    actual = dcg_score (y_true, y_score, k, gains)
    return actual / best

Assumes k = 5.

At this time, how should NDCG calculate for items that could not be recommended within kth?

For example,

y_true = [5,4,3,2,1]

y_score = [0,0,0,0,0] # 0 means we could not recommend within the top 5

At this time,

 np.argsort ([0,0,0,0]) [::-1]
array ([3, 2, 1, 0])

So, following the above code,

NDCG @ 5 = 1.0

This looks strange.

In such a case, should the score be 0 and not be included in the NDCG score calculation?

If you have any references, just showing them is fine with me.

Thank you.

Topic ndcg python recommender-system

Category Data Science


IMHO,

The fundamental definition of DCG is that it is a measure of ranking quality. This assumes that you have computed the utilities of each document/item and ranked them in a certain order.

With this definition in mind, if you have n-items with same utility (which is 0 in your case), computing NDCG to measure the ranking quality within this subset of items (since you are only looking at items 5, 4, 3, 2 and 1, all of which are not recommended), will yield you a NDCG score of 1 - since your ranking is perfect if you are only looking at these items.

NDCG is merely a way to quantify the quality of ordering, i.e., current order Vs perfect order (items sorted w.r.to their utilities). This is meaningless if you are looking ONLY at items with same utility score.

I hope this answers your question.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.