Two definitions of DCG measure

Question

Two definitions of DCG measure

WoofDoggy

2018年8月10日 14:21

I wanted to check the definition of Discounted Cumulative Gain (DCG) measure in the original paper Jarvelin and it seems it differs from the one given in the later literature Wang. Originally, for $n$ documents ranked from $r = 1, \ldots, p$, the $\text{DCG}_p$ is defined as $$\text{DCG}_p = \sum\limits_{r=1}^{b} G_r + \sum\limits_{r=b}^{p}\frac{G_r}{\log_br},$$ where $G_i$ is the relevance (or gain) of the $i$-th document. So the measure depends on the logarithm base $b$. For ranks below $b$, i.e. $rb$, gains are not penalized. If $b=2$, then we can write: $$\text{DCG}_p = G_1 + \sum\limits_{r=2}^{p}\frac{G_r}{\log_2 r}.$$ It does not look the same as the one given on wikipedia, where the argument of the logarithm is shifted by $1$: $$\text{DCG}_p = G_1 + \sum\limits_{r=2}^{p}\frac{G_r}{\log_2(r+1)}.$$

Where does this change come from? Why others use different metric?

Topic learning-to-rank ranking information-retrieval recommender-system machine-learning

Category Data Science

Sean Owen · Accepted Answer · 2018年8月10日 14:21

I believe you are correct, that the paper and Wikipedia disagree. The paper's formula suggests you apply no discount at $r <= b$, which means both of the first two elements are not discounted.

The Wikipedia formula discounts the second element onward.

There's an impassioned statement in Talk about why the Wikipedia formula is right: https://en.wikipedia.org/wiki/Talk:Discounted_cumulative_gain

But I can't see why; it offers no reference other than observing that "it seems plainly wrong to not discount from the second element." I'll comment there.

Two definitions of DCG measure

About