Calculate implicit rating from streaming behaviour for Recommendation Engine

I have a dataset containing some user streams data for particular videos like below:

u_id|start_stream_time_dt|watch_time_ms|video_category
   1|        2021-02-01  | 3600        |  Live

My goal is to build a recommender system for watch streams.

However, I would like to find the optimal watch_stream threshold (or other approaches) that would allow me to define if a user has indeed watched a video because he/she's interested.

In other words I'd like to fill the 1s in the user_item matrix based on these information I have.

Any good approaches to suggest me? or any resources I can look up?

Thanks.

Topic matrix-factorisation recommender-system data-cleaning machine-learning

Category Data Science


Create a histogram of the watch_time_ms. If you are lucky - you may see a bi-modal distribution (i.e two peaks). The higher/lower peak could be interpreted as interested / uninterested behavior respectively. Then your threshold could lie somewhere in the valley between your two peaks in the histogram.

If your videos are variable length - you may also want to try normalizing the watch_video_ms by dividing with the total_video_length_ms before reviewing the histogram. You then end up with smooth values between 0 - 1 for each video.

If there is no obvious visible pattern in the histogram - then choose a percentile to divide between interested / uninterested views. For e.g. the top 25 percentile of views represent are the 1's whereas the remainder are 0's. Make sure you choose the percentile appropriately based on the problem domain / objective.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.