How to do Data acquistion focused on improving accuracy on hold-out test set?
I have the task of coming up with a model of 95% accuracy for a classification problem. I have training data and a hold-out data set. I have the opportunity to request data of a particular class with desired characteristics to achieve this objective.
What method shall I use to plan the data acquisition through another team? I am currently at 86% accuracy. I use LightGBM for the model development. Would consider parameter tuning and ensemble with XGBoost and TabNet. But I think I need better data to achieve higher accuracy. Feature engineering is also in play.
Also note that it is a multi-class classification problem.
Topic active-learning classification dataset
Category Data Science