Estimating class prevalence in unlabelled data after predicting labels with a binary classifier

I'm looking to get an estimate of the prevalence of 1's (i.e. the rate of positive labels) in a very large dataset that I have. However, I am hoping to report this percentage as a 95% credible interval instead of as an exact estimate of rate, taking into account the model uncertainties. These are the steps I'm hoping to perform: Train a binary classifier on labelled training data. Use a labelled test set to estimate the specificity and sensitivity of …
Category: Data Science

Computing probabilities in Plackett-Luce model

I am trying to implement a Plackett-Luce model for learning to rank from click data. Specifically, I am following the paper: Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank. The objective function is the reward function similar to the one used in reinforcement learning $R_d$ is the reward for document d, $\pi(k \vert d)$ is the probability of document d placed at position k for a given query q. $w_k$ is the weight of position …
Category: Data Science

What are the requirements for a word list to be used for Bayesian inference?

Intro I need an input file of 5 letter English words to train my Bayesian model to infer the stochastic dependency between each position. For instance, is the probability of a letter at the position 5 dependent on the probability of a letter at position 1 etc. At the end of the day, I want to train this Bayesian network in order to be able to solve the Wordle game. What is Wordle? It’s a game where you guess 5 …
Category: Data Science

Custom regularisation for logistics regression

My understanding of l2 regularisation: Weights of the model are assumed to have a prior guassian distribution centered around 0. Then MAP estimate over data adds an extra penalty in cost function. My problem statement: I am making a reasonable assumption(based on domain knowledge) that my features are independent which means I can use the weights of the features to infer the importance of features in influencing Y. From domain knowledge, I want to assume priors about the ratio of …
Category: Data Science

Incorrect example of applying Bayes theorem

I have been reading the book "The Data Science Design Manual" (by Steven S. Skiena) and I came across an example that explained how the Bayes theorem can be applied that confused me and made me suspect it might be wrong. The example is the following: $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$ Suppose A is the event that person x is actually a terrorist, and B is the result of a feature-based classifier that decides if x looks like a terrorist. …
Category: Data Science

Why and how Variational Inference underestimates variance?

I referred to the Quora link here as well, but could not understand clearly. Can anyone please help me understand why and how variational inference underestimates the variance of the true posterior distribution with some theory or mathematical calculations? [EDIT]: Adding my understanding of the Quora answer based on a visualization. The red line is p(x). The green line is q(x), the approximating distribution. The blue line is the KL divergence. When q(x) is less than p(x), the KL divergence …
Category: Data Science

Confusion regarding which distribution Monte Carlo considers for sampling

Considering Bayesian posterior inference, which distribution does Monte Carlo sampling take samples from: posterior or prior? Posterior is intractable because the denominator (evidence) is an integration over infinite theta values. So, if Monte Carlo samples from posterior distribution, I am confused as to how the posterior distribution is known as it is intractable. Could someone please explain me what I am missing? If Monte Carlo samples from prior distribution, how does the samples approximate to posterior distribution?
Category: Data Science

Poisson model with overdisperssion

I'm working with a dataset $X$ (of length $N$) of count data, which looks like: I developed a statistical model which can be improved, so I'm asking for any suggestions, for instance, differnet likelihoods or prior selection, different approach, anything... My model I'm trying to get the parameters of the likelihood of the data, so thaht I can get a posterior predictive density function, credible intervals and so on. Let's say, I want to model the generative process of the …
Category: Data Science

What are the tradeoffs between Bayesian Deep Learning and Deep Gaussain Processes?

I understand the differences between Deep Gaussian Processes(DGPs) and Bayesian Deep Learning(BDL): DGPs are essentially feed-forward neural networks where each node is a Gaussian Processes, which BDL places a prior belief on parameters of a normal(potentially convolutional) neural network. But what are the trade-offs and relationships between these two models?
Category: Data Science

How to update the posterior belief when we are observing a stream of correlated data from a fixed but unknown data source

I want to build a [probabilistic] model that aims to infer the true value of an unknown categorical variable, $y \in \{1,2,..., K\}$. We have a dataset $(X,y): \mathbb{R}^d\rightarrow \{1,2,..., K\}$ and we can train a classifier that gives $d$-dimensional data, $X$, and estimates the output $y$. Now, suppose that $X$s are correlated and all coming from a fixed $y$. I mean, we are observing $X^1, X^2,...., X^T,...$ over time and we know that $y$ is fixed for all of …
Category: Data Science

Combining multiple probabilities from a classifier. Propagating probabilities

Let's say I have trained a classifier that classifies images of animals into 10 different classes. And let's say that I have 20 different images of a particular animal and because I know the photographer, I know with certainty that all 20 images are of the same animal. So I use my classifier to make a prediction on what animal it is and get 20 predictions one for each image. The model predicts all the images to be a dog …
Category: Data Science

Really confused with characteristics of Naive Bayes classifiers?

Naive Bayes classifiers have the following characteristics-: They are robust to isolated noise points because such points are averaged out when estimating contiditional probabilities from data. Naive Bayes classifiers can also handle missing values by ignoring the example during model building and classification. They are robust to irrelevant attributes. If X_i is an irrelevant attributet then P(X_i/Y) becomes almost uniformly distributed. The class conditional probability for X_i has no impact on overall computation of posterior probability. I barely understand anything …
Category: Data Science

Model transfer with limit to none label information

I have this problem I hope to get some help here. Say I have a type of product A whose measurements are X_A and an outcome property is y_A. y_A is a continuous variable. Then I can have a predictive model out of it using X_A, y_A. Now I have a product B. It's similar to product A but not exactly the same, like an orange to a grapefruit. For product B, I have plenty of X_B measurements, but very …
Category: Data Science

How to test if a curve is well described by an ellipse?

I have a set of data points in 2D, and I am trying to come up with some sort of statistical to determine if the points fall along an ellipse. My idea so far is to fit an ellipse to the points, take the main square error, and use this as an indicator. However, this requires me to set some threshold for what is a good MSE (so MSE above this threshold indicate that the points do not fall along …
Category: Data Science

Confused on Naive Bayes classifier

In the last part of Andrew Ng's lectures about Gaussian Discriminant Analysis and Naive Bayes Classifier, I am confused as to how Andrew Ng derived $(2^n) - 1$ features for Naive Bayes Classifier. First off, what does he mean by features in the context he was describing? I initially thought that the features were characteristics of our random vector, $x$. I know that for the total possibilities of $x$ it is $2^n$ but I do not understand how he was …
Category: Data Science

PyMC3: how to efficiently regress on many variables?

I am sorry ahead of time if this seems like a basic question, but I had difficulty finding resources online addressing this. In PyMC3, when building a basic model of a few variables, it is easy to define each on their own, like alpha=pm.Normal('alpha',mu=0,st=1), and manually add them all with each other. However, what are the standard approaches when one is dealing with dozens/hundreds of variables, each needing a prior? I see that the shape argument is helpful in defining …
Category: Data Science

When to model a problem by using the Bayes' theorem?

I have a labeled training dataset where each observation has a sentence either in English or in French as its predictors and its label (target value) is whether this sentence is English or French. The test set includes again some sentences either in English or in French but without labels. A friend of mine suggested that we should model this problem by using the Bayes' theorem since we have have some prior values (labeled observations in training set). I agree …
Category: Data Science

How to test dev set on Time Series data via forecasting

I'm implementing $3$ Bayesian Deep Learning models (links below) for my masters. I'm supposed to test them on a civil engineering time series data. My models should take a time series covariate vector ($X_t$ = {$x1_t$, $x2_t$, ...}) as input and predict single values $y_t$. I will use past values of $X$, (e.g. $X_{t-1}$, $X_{t-2}$) on each $y_t$ but the models won't be feed with past values of $y_t$ because these will not be available on a real situation. The …
Category: Data Science

Update of mean and variance of weights

I'm trying to understand the Bayes by Backprop algorithm from the paper Weight Uncertainty in Neural Networks, the idea is to make a NN in which each weight has it's own probability distribution. I get the theory, but I don't undertsand how to update the mean and variance in the learning part. I found a code in Pytorch which simply does: class BayesianLinear(nn.Module): def __init__(self, in_features, out_features): (...) # Weight parameters self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-0.2, 0.2)) self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-5,-4)) …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.