I used Decisiton Tree Classifier which I trained with 50 000 samples. I have also set with unlabeled samples, so I decided to use self training algorithm. Unlabeled set has 10 000 samples. I would like to ask if it is normal, that after retrainig model with these 10 000 unlabeled samples, accuracy didn't chaned as well as confusion matrix has same values? I expected some changes (better or worse prediction). Thank you in advance.
So far, I have stumbled upon many advices and papers on PU Learning and Unary classification. TLDR: Does anyone have suggestions for specific algorithm or implementation for labeled data of only one class and unlabeled data that can be from either classes? And I'm unsure what is the proportion of Class A to B that exists within the unlabeled data. The simplest answer I have found has been one-class SVM (Binary semi-supervised classification with positive only and unlabeled data set), …
I am doing anomaly detection recently, one of the methods is using AEs model to learn the pattern of normal samples. Determine it as an abnormal sample if it doesn’t match the pattern of normal samples. I train AE without labels but we need to use ‘label’ to determine which sample is normal or abnormal. I am wondering what kind of this training is supervised learning,semi-supervised learning or unsupervised learning?
what about difference between the meta learning and semi-supervised learning and self-supervised learning and active learning and federated learning and few-shot learning? in application and in definition? Pros and cons?
What are the differences between zero-shot , one-shot , few-shot learning? and what about their difference in usage/ application? fields of their application? Comparisons of their Pros & Cons?
Label propagation provided by scikit-learn only allows two options for constructing the affinity matrix. 1) RBF and 2) kNN. The former results in a completed graph where weight on each edges is the rbf function of the distance between samples whereas the latter results in a sparse graph where each sample is only connected to its k nearest neighbors and weights are all ones. I want to combine these two options: only connect each sample to its k nearest neighbors …
I've been looking into semi-supervised learning more, specifically label propagation and label spreading. When reading through tutorials and some papers I've seen it mentioned that often times the results of label propagation will then be used build a supervised model. It's not clear to me why this is necessary, or that it is beneficial. What is the purpose of building another model with the results of label propagation after you have already obtained the labels for your unknown data? Couldn't …
I am currently exploring anomaly detection methods for my work and, basically I have gone through Local Oulier Factor and Isolation Forests, both unsupervised methods. Now, the thing is, there might be a chance that I do not want a point that is far away to considered as an outlier, and so I would need some sort of supervised or semi supervised method for the outlier detection. So what I am thinking is: 1.Label a bunch of points as outlier …
I have a question about the forecasting of anomalies. I would be very grateful if you could refer me to some papers that deal with this kind of problem or give me some hints to start with this problem. I have some products that go to a bigger machine and some forces act on these products for about 5 minutes. After that, some of these products are not normal and they are anomalies. I want to predict an anomaly before …
I have the following semi-supervised problem: I have a graph of persons and their relations. Some of those persons have a predefined risk classification. Classify the risk of the other nodes. I know risk is kind of arbitrary that's why I'm open to any ideas. An example is, suppose I have a person with classification critical (10) and I wanted to find the risk classification of their neighborhood. I thought on doing something like for every node, for every fixed …
I'm new to deep learning and I wish to implement a semi-supervised algorithm for video summarization. I am using the "Lamem" dataset and I have frames from the video along with the importance score of each frame as the ground truth. What semi-supervised algorithm should I use? It should take the frames as input and then predict the importance scores using a test dataset. As a newbie to this field can anyone guide me on what procedure to follow? Maybe …
For classifying text into three classes question, complain and complements where each sample can have multi-labels (question and complains, question and complements): is it better to have one model for all three targets? or two models, the first for (question or not) and the second one for (complains, complements or else)? which approach is better when the data are labeled, unlabeled and unbalanced?
Say that we have a set of treatment plans (the options) available to a patient. Treatment plans can be invasive-surgery, no-surgery, less-invasive surgery ext... We have a dataset where a treatment plan was chosen for a patient and and we also have their outcome (Survived/Did-not-survive). What is the best way to go about grading/ranking/choosing an optimal treatment plan so that we retain optimal survival rates. To me this sounds like a recommendation algorithm but the way it seems most recommendation …
I am practicing semi-supervised learning, at the moment experimenting with sklearn.semi_supervised.SelfTrainingClassifier. I found a dataset for multiclass classification (tweet sentiment classification into 5 sentiment categories) and randomly removed 90% of the targets. Since it is textual data, preprocessing is needed: I applied CountVectorizer() and created a sklearn.pipeline.Pipeline with the vectorizer and the self-training classifier instance. For the base estimator of the self-training classifier I used RandomForestClassifier. My problem is, when running the below script, no training happens. The argument verbose …
I am aware of the existence of semi-supervised learning approaches, such as the Ladder Network, where only a subset of the data is labeled. Are there any methods or papers which consider correctness probabilities for the labels of that training data subset? That is, some labels may be correct with 100% probability, while others may have only 70% or 45% probability of being correct. Any links to papers or work in this direction are highly appreciated.
I've read lots of papers on query strategies like BADGE, SCALAR, BatchBALD etc, but they all seem to be for situations where there is a single label to give an image (is this a cat, dog or horse), but for tasks like vehicle camera images (using models like YOLO) a single image may have multiple labels in multiple classes. In this case, what techniques are available for addressing Semi-Supervised Learning?
Looking at various materials for PNU Semi-Supervised Learning, they seem to be all based around binary classification, as the name implies. How easy is to apply these methods to classifications with multiple labels? So, rather than just "Is this a cat?", we have "Is this a cat?", "Is this a dog?", "Is this a horse?", etc. Or in this situation are other approaches better? I saw this question, but it didn't really help my understanding.
Pretraining and fine-tuning the algorithm of wav2vec2.0, the new one using in FAcebookAI to do speech to text for low-resource language. I didn't actually get how the model does the pretraining part if someone can help me, I read the article https://arxiv.org/abs/2006.11477 but I ended up not getting the notion of pre-train in this regard. the question is HOW do we do pretraining?! Note : i'm a beginner in ML, so far , i've done some project with nlp,I have …
GPT-1 mentions both Semi-supervised learning and Unsupervised pre-training but it seems like the same to me. Moreoever, "Semi-supervised Sequence Learning" of Dai and Le also more like self-supervised learning. So what the key differences between them?
I have a question regarding label propagation and label spreading semi-supervised algorithms. I am working on building a look-alike model to identify marketing personas. Using supervised learning algorithms is getting quite complicated as it takes a lot of time to run. And using unsupervised learning is quite complicated as well, as we need to specify "k". I am unsure of how to automate this process for new datasets. I need a middle ground machine learning algorithm suggestion that has lower …