Deal with huge amount of data
I'm writing to get advices about my project.
I want to make recommander system for shop with some products. In fact i want to recommand to shop A to take item X because shop B sell this item and shops A and B are very similar.
The "problem" here is the size of the data : i have around 5TB of raw data (about 8 000 000 000 lines) So it's very difficult to do something with huge data like this.
So my questions are :
-It is relevant to use database like MongoDB (or NoQSL for my data) ?
-How can I build utility matrix for recommandation ? (10 000+ shops and 1 milion+ items)
-Do you recommand technologies for that (i've heard about Neo4J graph database, it is relevant to store the relation between shops and items ?)
-The data is maybe too small for Hadoop and i dont have enough computers for the nodes
Thanks
Topic mongodb python recommender-system nosql
Category Data Science