Optimization of a simple M x N dataset
I have a dataset consisting of M questionnaires and N students. Each students replied to some questionnaires.
I would like to make the dataset better, by removing some questionnaires and/or some students. The goal is to optimize the dataset so we have as few holes as possible. To be clear, a hole in the dataset is when a student did not reply to a questionnaire.
Let's say the number of holes in the dataset is H. We want H as low as possible, while M and N are as high as possible.
How would one go to optimize such a problem?
Topic missing-data optimization dataset
Category Data Science