"Hadoop" formats for user database: online advertising
I was wondering if someone could point me to suitable database formats for building up a user database:
basically I am collecting logs of impressions data, and I want to compile a user database
which sites user visits, country/gender/..? and other categorisations with the aim of a) doing searches: give me all users visiting games sites from france... b) machine learning: eg clustering users by the sites they visit
so I am interested in storing info about 100's of millions of users
with indexes? on user, sites, geo-location
and the idea would be that this data would be continually updated ( eg nightly update to user database of new sites visited etc)
what are suitable database systems. Can someone suggest suitable reading material? I was imagining Hbase might be suitable...
Topic hbase
Category Data Science