Do I need to read an entire database for a recommendation system?

Let's say I have a database with approx 100000 rows. I want to build a content-based recommendation system. Do I really need to read the entire database to calculate similarity? That would be very expensive to do it hosted on AWS, Azure, etc. Additionally, my data is always changing (new data being added, old removed), so I can't just use a constant file. Is there a more cost-effective way?

Topic cloud nlp recommender-system databases

Category Data Science


You can extract a random sample from the whole data set. Start with a very small one (~1000 rows) to build a first recommendation system. Then you can increase the quantity depending on the results and the computation capabilities. If you have new data, just repeat the same process.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.