Data augmentation for recommendation systems

I have a user-item matrix that I use to train a denoising autoencoder to predict the top-k items to recommend to the different users. The idea is to corrupt the matrix by erasing a percentage p of the items that each users bought and train the autoencoder to reconstruct the uncorrupted matrix. Following the implementation of this paper, I am currently erasing 20% of the bought items. I was wondering if it is legit to augment the dataset by first …
Category: Data Science

Noisification of categorical data proportions for privacy-preservation

Imagine I'm conducting an ongoing poll asking people's favourite animal out of a list of animals, [cat, dog, penguin, chimpanzee, ...] etc. I want to provide an interface that lets people query this poll data to see the relative popularity of each animal by different demographics. For example, querying the general population might reveal the plurality of respondents (36%) prefer penguins, but querying the 18-25 age-bracket might the plurality of respondents in that cohort (41%) prefer cats. It's desirable to …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.