Deal with huge amount of data

I'm writing to get advices about my project.

I want to make recommander system for shop with some products. In fact i want to recommand to shop A to take item X because shop B sell this item and shops A and B are very similar.

The "problem" here is the size of the data : i have around 5TB of raw data (about 8 000 000 000 lines) So it's very difficult to do something with huge data like this.

So my questions are :

-It is relevant to use database like MongoDB (or NoQSL for my data) ?

-How can I build utility matrix for recommandation ? (10 000+ shops and 1 milion+ items)

-Do you recommand technologies for that (i've heard about Neo4J graph database, it is relevant to store the relation between shops and items ?)

-The data is maybe too small for Hadoop and i dont have enough computers for the nodes

Thanks

Topic mongodb python recommender-system nosql

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.