Evaluation of recommendation systems
I have developed a content-based recommendation system and it is working fine. The input is a set of documents={d1,d2,d3,...,dn} and the output will be Top N similar documents for a given document output={d10,d11,d1,d8,...}. I eyeballed the results and found it to be satisfactory, the question I have is how do I measure the performance, accuracy of the system.
I did some research and found that recall, precision, and F1-score are used to evaluating the recommendation systems that predict user ratings. For this, we should no the original ratings and then the system should predict the ratings later we can plot the confusion matrix and then compute the aforementioned metric. However, in my case, I don't predict anything instead I measure the cosine similarity score sort it in descending order and pick the top N.
In this use case, how do I evaluate the system?
Thanks