Approaches to fit a theoretical model on a physical device

Happy to join this community. Thank you in advance for your kind help! :) Intro I have a physical device characterized by its internal parameters, of which I know the nominal values. I also have the theoretical model of the device, that differs from the physical device because fabrication tolerances change the internal parameters. I would like to extract the internal parameters of the device by fitting the model onto it. The device also has additional inputs that alter its …
Category: Data Science

what about differences between the meta and semi-supervised and self-supervised and active and federated and few-shot learning?

what about difference between the meta learning and semi-supervised learning and self-supervised learning and active learning and federated learning and few-shot learning? in application and in definition? Pros and cons?
Category: Data Science

How to do Data acquistion focused on improving accuracy on hold-out test set?

I have the task of coming up with a model of 95% accuracy for a classification problem. I have training data and a hold-out data set. I have the opportunity to request data of a particular class with desired characteristics to achieve this objective. What method shall I use to plan the data acquisition through another team? I am currently at 86% accuracy. I use LightGBM for the model development. Would consider parameter tuning and ensemble with XGBoost and TabNet. …
Category: Data Science

Difference between Active learning and Optimal experimental design?

During my research in the active learning field, I found a similar concept that has the same idea that is the optimal experimental design (OED) for machine learning which is based on finding new data points to do experiments on in order to improve the performance of our model. This made me wonder if OED is a subfield from active learning or it is completely different. Any information will be useful and appreciated. Thank you.
Category: Data Science

Active Learning: Query datapoints from output space

I am trying to apply active learning to a model to improve its performance. however the oracle, cannot label the samples based on the input space's features, instead, it uses the output sample(label) to do the experiment and get the data point. I am wondering if it would be possible to make the query strategy ask for points by the label and query its corresponding input values?
Category: Data Science

When should we apply active learning in testing?

Case 1: I would apply active learning to query a small chunk of samples gradually to label them and my model is being trained during this process. After a certain number of iterations, I have a training dataset with specific performance of the model. Case 2: I re-train the model from scratch with the training dataset in case 1. Question 1: do you think the performance of the model will be the same in both cases? why, please? Question 2: …
Category: Data Science

How to construct a test set for an active learning project?

With active learning I hope to keep the annotation effort to a minimum, yet building still a good classifier. My initial starting point is that I have about 20k images which can belong to ten different classes, and have 0 labeled images at the moment. After each active learning iteration, I hope to get the labels of e.g. 100 images. If it matters, unfortunately, the data is very likely imbalanced which means that five classes are probably very rare. So …
Category: Data Science

Self-attention model trained with active learning stops learning after a few iterations

I'm doing some active learning with uncertainty sampling on a self-attention model implemented in PyTorch. The algorithm works as follows (steps 3-7 are repeated for 14 iterations): 1. Take 10% of the data as training set, L 2. Train the model on L 3. Either rank the remaining samples U by a certain informativeness measure and pick a batch B of the top n samples, or randomly pick a batch B 4. Add B to L 5. Remove B from …
Category: Data Science

Active learning with mixture model cluster assignments - am I injecting bias here?

Suppose I have a dataset of people's phone numbers and heights, and I'm interested in learning the parameters $p_{girl}$, $p_{boy}=1-p_{girl}$, $\mu_{boy}$, $\mu_{girl}$, and overall $\sigma$ governing the distribution of peoples' heights. I don't have labels for boys or girls yet, but if I really want to, I can call the phone number and ask if the person is a boy or girl. Procedure: Fit a Gaussian mixture model to heights via EM. Assign the greater of the $\mu$s to be …
Category: Data Science

Python package for machine-learning aided data labelling

In a lot of cases unlabelled data needs to be transformed to labelled data. The best solution is to use (multiple) human classifiers. However, going to all the data by hand (i.e. in text-mining or image-processing) is often a daunting task. Is there software that can combine human classifiers and machine-learning techniques in real time? I am especially interested in python packages. To illustrate, classifying images from video streams is very repetitive. After 100 images (from different streams) a machine-learning …
Category: Data Science

Identify outliers for annotation in text data

I read the book "Human-in-the-Loop Machine Learning" by Robert (Munro) Monarch about Active Learning. I don't understand the following approach to get a diverse set of items for humans to label: Take each item in the unlabeled data and count the average number of word matches it has with items already in the training data Rank the items by their average match Sample the item with the lowest average number of matches Add that item to the ‘labeled’ data and …
Category: Data Science

Model selection in active learning

I am dabbling in active learning and was wondering how to combine this in seeking out the best architecture for the network. In my understanding, active learning uses a heuristic for selecting the best instances to label in order to learn as quickly as possible. However, the way these instances are chosen are dependent on the model itself. Is there a way to handle this model dependency? It seems to me that models architecture is dependent on the train size, …
Category: Data Science

What is the difference between active learning and reinforcement learning?

From Wikipedia: Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs. Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. How to distinguish them? What are the exact differences?
Category: Data Science

Recommendation system with active learning

I have data where companies ask users to score a bunch of questions but some items may be randomly chosen while others are personalized. Users score higher in personalized questions on average. I have a user ID, question ID, corresponding score of the question by the user, and whether the question is random or personalized. I want to build a recommendation system that incorporates the feature of a question being random or personalized. I assume that for a personalized item …
Category: Data Science

Is active learning able to detect challenging cases?

Let's say we have a set of data points that need to be labelled for a classification task. In the pool-based active learning, if we go with the uncertainty measure, is the AL approach able to detect challenging cases? By challenging cases I mean samples that receive a high prediction score for $\hat{y}$ (e.g. >90%) but, most probably, $\neg\hat{y}$ is the correct prediction. The rationale behind my question is: does adding more samples to the training set always improve the …
Category: Data Science

BIO tagging software

I would like to label character data with BIO tags as part of an active learning process on unlabelled data. I am assuming there are open source GUI tools available which I can use to make this easier - i.e. present the string to be labeled and some way of tagging characters from a predefined set of tags (and probably allow new tags to be added). I have not been able to find anything though - ideally cross-platform (Linux and …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.