Probability for label correctness in semi-supervised learning

I am aware of the existence of semi-supervised learning approaches, such as the Ladder Network, where only a subset of the data is labeled. Are there any methods or papers which consider correctness probabilities for the labels of that training data subset? That is, some labels may be correct with 100% probability, while others may have only 70% or 45% probability of being correct. Any links to papers or work in this direction are highly appreciated.

Topic labels unsupervised-learning supervised-learning semi-supervised-learning

Category Data Science


I do not know any papers, I would be greatly appreciated if someone would link some.

In my case, I always test by un-labelling my already known labelled Data by using a "traditional" 67% - 33% (train - test) split and checking how the labelling performs in various metrics (Accuracy, Logloss, etc.).

Moreover, there are various categories of Semi-Supervised Learning. For instance, if you would use Active Learning (pool based approach, includes incrementally sample/s each time), you could view how the performance ranges.

Finally, you may use as well Cross-Validation in the same sense for hyper parameter tuning for your Semi-Supervised Learning algorithm. All in all, in my perspective Semi-Supervised Learning, has a lot yet to offer and your question may be rather more domain, data specific and visualsing the Data and Clustering may give you new insights.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.