What are the benefits of combining semi-supervised and supervised learning methods?

I've been looking into semi-supervised learning more, specifically label propagation and label spreading. When reading through tutorials and some papers I've seen it mentioned that often times the results of label propagation will then be used build a supervised model. It's not clear to me why this is necessary, or that it is beneficial. What is the purpose of building another model with the results of label propagation after you have already obtained the labels for your unknown data? Couldn't you just use label propagation for predicting any new labels that you encounter in the future? I assume this has something to do with label propagation being a transductive algorithm? But I've seen that the algorithm can be extended to an inductive algorithm, is that correct? Furthermore, if you're building a model using labels that are predictions themselves doesn't this have the propensity to introduce a lot of bias into said model?

Topic supervised-learning semi-supervised-learning classification

Category Data Science


I can't answer all your questions because I don't know that much, but maybe I can help about the basic semi-supervised learning approach:

  • the labels obtained for the unlabelled instances (label propagation) are typically predicted by a supervised model trained on a very small set of labelled instances.
  • This model is likely to overfit and consequently to make a lot of errors, this is why (1) its predictions on the unlabelled instances are unreliable, and (2) it cannot be used directly to label other instances.
  • Instead, the semi-supervised learning process typically runs many iterations, often using different subsets of instances (and/or various other techniques), in order to measure which predictions are the most likely correct. This way the model is progressively refined towards making more reliable predictions.
  • There's usually some kind of convergence criterion which indicates that the iterative process can be stopped. The final model can be used to predict the unlabelled instances as reliably as possible.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.