Solutions for Labelling Training Data for Binary Classification Problems
I have a huge dataset for which I am trying to use an 80-20 (Holdout method) approach to train and test my model. However, the dataset I have been given has 6m rows. The objective is to train+test+validate the model before using live data traffic for real-time predictions.
The expected result here is It's not corrupted with 97% accuracy which is implementation details and output of some Jupyter notebook etc.
My Question is - Is there any alternatives than manually labelling such a big dataset?
By manually labelling - I mean a human (or a group) going through all the 6m rows(!). Also, not all input strings have identical contents so it's hard to just push it through some script/csv and automate it. But I am trying to understand if this is the ONLY way.
Topic labelling semi-supervised-learning classification
Category Data Science