Suppose I have a set of examples $X = (x_1,x_2,..,x_n)$ with continuous numeric targets $Y = (y_1,y_2,..,y_n)$. While it is standard to use regression models to make point predictions of $y_i$ as $f(x_{i}) = \hat{y}_i$, I am interested in predicting a density function for $y_{i}$. What I want is analogous to the use of probabilities in classification instead of hard predictions (e.g. predict vs predict_proba in Scikit-learn), but for continuous regression problems. Specifically, a different density function (e.g. in the …
I have a dataset which contains two overlapping distributions/classes of points. I have been trying to sample from just one of these distributions/classes using the scikit learn Kernel Density class, but I am finding this does not work well in overlapping regions. Is there a way to do this sort of KDE sampling that also takes into account/avoids areas where these two distributions overlap? Ideally I would like to sample more often in non-overlapping areas or, when this is not …
I have some points in pytorch and I would like to sample from a distribution that resembles these points. I noticed that the seaborn kde plots seem to draw out/define a distribution graphically and I was wondering if there was a way to do something similar for sampling purposes. In other words I would like to feed my points into a function that uses them to define/approximate a distribution from which I can sample more points. Is this a feasible …
For sklearn.neighbors.KernelDensity, its score(X) method according to the sklearn KDE documentation says: Compute the log-likelihood of each sample under the model For 'gaussian' kernel, I have implemented hyper-parameter tuning for the 'bandwidth' parameter using Bayesian-Optimization as follows: # The input data for which 'bandwidth' needs to be tuned- data # (2880, 64) def kde_hyperopt_eval(bandwidth): params = {} params['bandwidth'] = bandwidth # Initialize a KDE model- kde_model = KernelDensity( kernel = 'gaussian', bandwidth = params['bandwidth'] ) # Train KDE model on …
For Python 3.9, sklearn version: 0.24.2 and numpy version: 1.20.3, I am using a Kernel Density Estimation (KDE) generative model. The goal is to generate new data using a given input data. The steps to achieve this involve: Scale input data to be in range [-1, 1] using MinMaxScaler Train KDE model on scaled input data Use trained KDE model to generate new sample/synthetic (scaled) data Use trained scaler from step - 1 to get data back in original scale …
I have a timeseries data for 1 week. The data contains readings from a device for certain hours of the day. There are about 8-10 readings per day at different timestamps. The timestamps recorded for each day are not necessarily the same. For example, day 1 has timestamps [08:10, 11:50, 13:40, 16:30] and day 2 has timestamps [09:00, 10:50, 14:30, 17:00]. Now i need to generate a new sample of data for an extended period of time (say 2 months). …
I have a binary dependent variable $t$ and categorical features. We can even simplify to binary features since I can one-hot encode the categorical variables. In practice the one-hot encoding induces collinearity in the binary features so for simplicity let's pretend we only have $D$ binary features. The purpose is to estimate the probability of $t=1$. In principle, I can use logistic regression. But, given the categorical nature of the input data they actually define a table of $2^D$ cells. …
I'm trying to implement a Naive Bayes classifier, which uses either of hypercubic Parzen window or KNN to estimate a density function. The data I'm using is Fashion MNIST. The steps I take are that first I zero center the vectorized data and divide it by its column-wise variance, and then I feed this to a PCA and get a 9 dimension vector. As for the Bayes decisions, for each class of the dataset, I get its samples and train …
I am writing some Python code to fit 2D Gaussians to fluorescent emitters on a dark background to determine the subpixel-resolution (x, y) position of the fluorescent emitter. The crude, pixel-resolution (x, y) locations of the pixels are stored in a list xy. The height of the Gaussian represents the predicted pixel intensity at that location. Each 2D Gaussian has 5 parameters, and my end goal is to find the optimal value of those 5 parameters for each peak using …
So I have a regression problem with bunch of features X, and labels in the amount (price $). How can I convert it to classification problem? I have read about convert label from continuous to categorical possibly thesholding at some points for instance 0-50 (0 class) 51-100 (1 class) and so on till 500 maybe (here thresh-holding to 50). This approach is ineffective intuitively i.e no data lay 51-100. Is there any way to mitigate this problem or if there …
I am currently testing some approaches for density estimation, and I think the basic approach of histograms may not be the best option to me and KDE is certainly a good alternative to go. While ago I found a very interesting tutorial by Jake VanderPlas which explains KDE in a nice way. In his tutorial, Jake optimized KDE bandwidth selection using grid search maximizing the log-likelihood of the density estimation given some samples, but that is built-in in sklearn and …
For a problem which I am working on at the moment, I'm interested in learning how the mean and variance of some response variable y changes with two independent variables x1 and x2 - i.e. for each coordinate in (x1, x2)-space I wish to have an estimate for $\mu_y$ and $\sigma_y$ in order to be able to approximately standardise new observations as they arrive. I have enough domain knowledge to expect both the mean and variance of y to vary …
all, i have a classification problem where i am predicting likelihood of client defaulting on loan. i plotted the predicted probabilities from my model, and then plotted against the label '1' for default or 0 for non-default. it is cut out here but y axis is the density. am i right to reason that this shows an exponential distribution, or that the fact the class 1 curve has a fat tail it shows that default is an extreme / unexpected …
I am working on a problem where I have a dataset of $X$ is dataset with $(X, Y, T, K)$ four attributes, I'd like to test if $P(X, Y, T)P(K) = P(X, Y, T, K)$, that is if $X, Y, T$ is independent of $K$. I have two questions: Is it possible to use kernel estimation to fit $P(X, Y, T)$ $P(X, Y, T, K)$ and $P(K)$ respectively and test the independence? If I did so, will the output be …
I am confused about the Parzen Window question. Suppose we have two training data points located at 0.5 and 0.7, and we use 0.3 as its rectangle window width. How do we estimate its probability density? According to the definition, the probability density is $\frac{k/n}{V}$, where $k$ is the number of patterns, $n$ is the total number of points, and $V$ is the window region. Therefore, is the density for this question $(1/2)/1$? Then what if we use a triangle …