How to manage sampling bias between training data and real-world data?

I'm currently working on a binary classification problem. My training dataset is rather small with only 1000 elements. (I don't know if it is relevant : my problem is similar to the "spam filtering" problem where a data can also be "likely" to be categorized as spam but i simplified it as a black or white issue, and use the probability given by the models to assign a likelihood score) Among those 1000 elements: 70% are from the class 1 …
Category: Data Science

Train consistent embeddings using text from different domains

I would like to train text embeddings using texts from two different domains (podcast summaries and movie summaries). The embeddings should capture similarities on topics the texts talk about, but ignore as much as possible the style the texts were written in. The embeddings I currently train using the universal multilingual sentence encoder clearly divide between the domains, which brings quite some distance between two documents that contain strong topic similarity but were written in a different style. I tried …
Category: Data Science

Train on multi-domains, then fine-tune on specific domain

Would it make sense to first train a model on images from multiple domains, and then do "fine-tuning" on one specific domain to improve its performance on it? For instance, one could train an object detector based on cars camera recorded in NYC, Paris and Beijing, then continue training on Paris only. For a model that would be deployed on Paris only, should we favor diversity or specificity? And does this training method has a name?
Category: Data Science

Computing symmetric difference hypothesis divergence $H \Delta H$ for two domains using a segmentation network

Given two domains $D_1$ and $D_2$, the symmetric difference hypothesis divergence ($H \Delta H$) is used as a measure how much two domains differ from each other. Let the hypothesis, segmentation network in my case, trained on two domains be $h_1$ and $h_2$ respectively. Then (according to this work by Loog et al); $d_{H\Delta H} = 2 \sup_{h_1,h_2 \in H} |\mathrm{Pr_s}[h_1\neq h_2] - \mathrm{Pr_t}[h_1\neq h_2]|$ Where, $\mathrm{Pr_s}[h_1\neq h_2] = \int_{X}[h_1\neq h_2]p_s(x)\delta x$ Since we do not have access to the …
Category: Data Science

How can I use transfer learning to predict height given age in Japan, using a model developed with USA data?

Suppose I have a (training) set of $n$ observation $\{(Y_i^{(U)},X_i^{(U)})\}_{i=1}^n$ of age $X_i^{(U)}$ and height $Y_i^{(U)}$ from people in the USA. Now suppose I also have a (test) set of $m$ observations $\{X_i^{(J)}\}_{i=1}^m$ of age $X_i^{(J)}$ only from people in Japan, where people are shorter on average. I want to predict the heights of people in Japan in the test set using transfer learning from the USA dataset. Suppose for simplicity the USA data is well-fit by the standard simple …
Category: Data Science

Latent space for cross domain numerical features

I would like to find the shared latent space between two set of features. I have source and target domain features already extracted from images. I have 4 set of feature vectors for normal and abnormal source and target domains. I would like to train on normal source and target features and predict on abnormal sets. How do I that? I have this idea, that if I create a shared space between two domains and give it to a classifier, …
Category: Data Science

Close set and open set classification at the same time

Is it possible to use a neural network(or another approach) to classify image based on trained data and at the same time if new image classes are introduced in the test set it should classify those unseen images(open set data) to new classes(kind of telling me which new class this new unseen data belongs to?) on which training is not done.
Category: Data Science

Why does increasing the training set size not improve the results?

I have trained a model on a training set, which is not that big (overall around 120 true positives, and of course lots of negative examples). What I am trying to do is to improve the results by increasing the data size. I tried two approaches: I added data from a different domain and concatenated the data with the existing one. It increased the F-score from 0.13 to 0.14. I added the same extra data instances, but this time with …
Category: Data Science

Training data from different sources

I am working on a binary classification problem. My data contains 100K samples from two different sources. When I perform the training and testing on data from the first source I can achieve classification accuracy up to 98% and when perform training and testing on the data from the second source, I can achieve up to 99%. The problem is when mix both of them, the classification accuracy goes down to 89%. Any idea how to perform the training to …
Category: Data Science

Discrepancy between training set and real-world data set: domain adaptation?

I have read in literature that in some cases the training set is not representative for a real-world dataset. However, I cannot seem to find a proper term describing this phenomenon; what is the proper term to address this problem? Edit: So far I have settled for the term domain adaptation, shortly described as a field in machine learning which aims to learn from a certain data distribution in order to predict data coming from a different (but related) target …
Category: Data Science

What is the difference between BatchNorm and Adaptive BatchNorm (AdaBN)?

I understand that BatchNorm (Batch Normalization) centers to (mean, std) = (0, 1) and potentially scales (with $ \gamma $) and offsets (with $ \beta $) the data which is input to the layer. BatchNorm follows this formula: (retrieved from arxiv-id 1502.03167) However, when it comes to 'adaptive BatchNorm', I don't understand what the difference is. What is adaptive BatchNorm doing differently? It is described as follows: (retrieved from arxiv-id 1603.04779)
Category: Data Science

Dealing with an apparently inseparable dataset

I'm attempting to build a model/suite of models to predict a binary target. The exact details of the models aren't important, but suffice to say that I've tried with half a dozen different types of models, with comparable results from all of them. On looking at the predictions on various subsets of the training data, it appears that a certain subset of features is important for around 30% of the data, while a different subset is important for the remaining …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.