Compressing profiles with a large number of dimensions
I 'think' this is a related question, but not sure how to apply it.
I'm trying to build out a very crude recommendation system using Amazon ML, Facebook likes, and historical actions.
So lets say we have a number of users within a system that promotes products within several categories. To better predict which categories of items to present to the user, we will consider their past interactions with specific items, and the past interactions of other users who share a similar profile. The profile consisting of basically the users Facebook likes data and some demographic info.
I'm unsure of how to distill the Facebook likes data in a way that lets me make meaningful comparisons between users.
I'm sure its obvious, but I'm completely new to machine learning, and data science in general. I'm currently limited to the capabilities of Amazon ML. Let me know if the question needs more clarification, constructive criticism is appreciated.
Edit:
As @liangjy pointed out, the solution to the recommendation system in general will be to use the collaborative filtering technique. This is most useful when their is sufficient data to link users based on their individual actions. Because we do not have enough/any data on new users, we are trying to use additional data (Facebook likes in this case) to help create that initial link. Where I'm stuck at is the vast number of profile/sites any one user may have liked. We could be comparing the likes of one user against tens of thousands of possibilities the rest of the users have presented. What is the best way to make this disparate data into something manageable?
I've considered taking a subset of users and pulling the top n sites (1,000?) from each demographic groups (age/gender). Then compare all other users to this base set, and create sub-groups based on their relationships to those sites. However, something about this approach feels like it would be skewed. I'm not sure what, but I'm pretty sure I won't get the results I'm looking for.
Topic amazon-ml machine-learning
Category Data Science