Data augmentation for recommendation systems

I have a user-item matrix that I use to train a denoising autoencoder to predict the top-k items to recommend to the different users.

The idea is to corrupt the matrix by erasing a percentage p of the items that each users bought and train the autoencoder to reconstruct the uncorrupted matrix.

Following the implementation of this paper, I am currently erasing 20% of the bought items.

I was wondering if it is legit to augment the dataset by first erasing the p=20% to create the noised matrix and, successively, use for instance p=40% and concatenate the two noised matrices and trin the autoencoder to reconstruct a stack of two uncorrupted matrices.

Is it reasonable or is it just an invitation for overfitting?

Topic noisification data-augmentation overfitting autoencoder recommender-system

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.