estimate user's satisfaction of a video based on how much of it they watched - normalization

I am trying to estimate how much a user liked a video using how much of the video they watched. Let's say, on the scale of 1 to 10, 1 means that the user didn't like it at all, and 10 means they enjoyed it a lot. For instance, if a user watched 8 minutes from a 10-minute video, it means the score of 8. If they watch 18 minutes of a 20-minute video, it means the score of 9.

The problem is, the probability of a short video (say 1 minute) being completely watched is much higher than that of a long video (say 120 minutes). It doesn't necessarily mean that the user liked it more. It just was short.

I am looking for an equation to consider the length of the video in the process of making the estimated score.

I came up with this:

raw_score - (raw_score / log10(video_length))

raw_score is a the estimated score mention above (1 to 10) and video_length is the length of the video in seconds. log10 is the base 10 logarithm. However, this results in drastic penalties. For instance, it would reduce the score from 10 to 5 for a short video.

I am looking for some way to normalize this penalty so that the amount that a score gets reduced can be limited to a specific range, for instance at most 2 points.

What is the best way to tackle this problem?

Topic estimation normalization feature-scaling recommender-system statistics

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.