Extract company names/job titles from free text

I have a complete Hadoop platform with HDFS, MR, Hive, PIG, Hbase, etc., Python, R, Java. All data sets have a large size. The data set A, describing the jobs of people working in a company, is composed of the following fields: Id Person: a unique alphanumeric identifier per person. Start Date: a date format iso entry in the post End Date: iso size release date of the position. If the date is not given, it is the current position …
Category: Data Science

"Hadoop" formats for user database: online advertising

I was wondering if someone could point me to suitable database formats for building up a user database: basically I am collecting logs of impressions data, and I want to compile a user database which sites user visits, country/gender/..? and other categorisations with the aim of a) doing searches: give me all users visiting games sites from france... b) machine learning: eg clustering users by the sites they visit so I am interested in storing info about 100's of millions …
Topic: hbase
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.