I'm currently working on a binary classification problem. My training dataset is rather small with only 1000 elements. (I don't know if it is relevant : my problem is similar to the "spam filtering" problem where a data can also be "likely" to be categorized as spam but i simplified it as a black or white issue, and use the probability given by the models to assign a likelihood score) Among those 1000 elements: 70% are from the class 1 …
I was wondering about the differences between "multi-task learning" and "domain generalization". It seems to me that both of them are types of inductive transfer learning but I'm not sure of their differences.
I would like to train text embeddings using texts from two different domains (podcast summaries and movie summaries). The embeddings should capture similarities on topics the texts talk about, but ignore as much as possible the style the texts were written in. The embeddings I currently train using the universal multilingual sentence encoder clearly divide between the domains, which brings quite some distance between two documents that contain strong topic similarity but were written in a different style. I tried …
Would it make sense to first train a model on images from multiple domains, and then do "fine-tuning" on one specific domain to improve its performance on it? For instance, one could train an object detector based on cars camera recorded in NYC, Paris and Beijing, then continue training on Paris only. For a model that would be deployed on Paris only, should we favor diversity or specificity? And does this training method has a name?
Given two domains $D_1$ and $D_2$, the symmetric difference hypothesis divergence ($H \Delta H$) is used as a measure how much two domains differ from each other. Let the hypothesis, segmentation network in my case, trained on two domains be $h_1$ and $h_2$ respectively. Then (according to this work by Loog et al); $d_{H\Delta H} = 2 \sup_{h_1,h_2 \in H} |\mathrm{Pr_s}[h_1\neq h_2] - \mathrm{Pr_t}[h_1\neq h_2]|$ Where, $\mathrm{Pr_s}[h_1\neq h_2] = \int_{X}[h_1\neq h_2]p_s(x)\delta x$ Since we do not have access to the …
Suppose I have a (training) set of $n$ observation $\{(Y_i^{(U)},X_i^{(U)})\}_{i=1}^n$ of age $X_i^{(U)}$ and height $Y_i^{(U)}$ from people in the USA. Now suppose I also have a (test) set of $m$ observations $\{X_i^{(J)}\}_{i=1}^m$ of age $X_i^{(J)}$ only from people in Japan, where people are shorter on average. I want to predict the heights of people in Japan in the test set using transfer learning from the USA dataset. Suppose for simplicity the USA data is well-fit by the standard simple …
I would like to find the shared latent space between two set of features. I have source and target domain features already extracted from images. I have 4 set of feature vectors for normal and abnormal source and target domains. I would like to train on normal source and target features and predict on abnormal sets. How do I that? I have this idea, that if I create a shared space between two domains and give it to a classifier, …
Is it possible to use a neural network(or another approach) to classify image based on trained data and at the same time if new image classes are introduced in the test set it should classify those unseen images(open set data) to new classes(kind of telling me which new class this new unseen data belongs to?) on which training is not done.
I have trained a model on a training set, which is not that big (overall around 120 true positives, and of course lots of negative examples). What I am trying to do is to improve the results by increasing the data size. I tried two approaches: I added data from a different domain and concatenated the data with the existing one. It increased the F-score from 0.13 to 0.14. I added the same extra data instances, but this time with …
I am working on a binary classification problem. My data contains 100K samples from two different sources. When I perform the training and testing on data from the first source I can achieve classification accuracy up to 98% and when perform training and testing on the data from the second source, I can achieve up to 99%. The problem is when mix both of them, the classification accuracy goes down to 89%. Any idea how to perform the training to …
I have read in literature that in some cases the training set is not representative for a real-world dataset. However, I cannot seem to find a proper term describing this phenomenon; what is the proper term to address this problem? Edit: So far I have settled for the term domain adaptation, shortly described as a field in machine learning which aims to learn from a certain data distribution in order to predict data coming from a different (but related) target …
I understand that BatchNorm (Batch Normalization) centers to (mean, std) = (0, 1) and potentially scales (with $ \gamma $) and offsets (with $ \beta $) the data which is input to the layer. BatchNorm follows this formula: (retrieved from arxiv-id 1502.03167) However, when it comes to 'adaptive BatchNorm', I don't understand what the difference is. What is adaptive BatchNorm doing differently? It is described as follows: (retrieved from arxiv-id 1603.04779)
I'm attempting to build a model/suite of models to predict a binary target. The exact details of the models aren't important, but suffice to say that I've tried with half a dozen different types of models, with comparable results from all of them. On looking at the predictions on various subsets of the training data, it appears that a certain subset of features is important for around 30% of the data, while a different subset is important for the remaining …