density-estimation

Machine Learning for conditional density estimation

Enk9456

2022年5月17日 11:48

Suppose I have a set of examples $X = (x_1,x_2,..,x_n)$ with continuous numeric targets $Y = (y_1,y_2,..,y_n)$. While it is standard to use regression models to make point predictions of $y_i$ as $f(x_{i}) = \hat{y}_i$, I am interested in predicting a density function for $y_{i}$. What I want is analogous to the use of probabilities in classification instead of hard predictions (e.g. predict vs predict_proba in Scikit-learn), but for continuous regression problems. Specifically, a different density function (e.g. in the …

Topic: density-estimation probability regression

Category: Data Science

KDE Sampling with negative density and/or class-specific weighting

wigeon

2022年3月27日 04:51

I have a dataset which contains two overlapping distributions/classes of points. I have been trying to sample from just one of these distributions/classes using the scikit learn Kernel Density class, but I am finding this does not work well in overlapping regions. Is there a way to do this sort of KDE sampling that also takes into account/avoids areas where these two distributions overlap? Ideally I would like to sample more often in non-overlapping areas or, when this is not …

Topic: density-estimation noise distribution sampling scikit-learn

Category: Data Science

Function for KDE-style distribution generation for sampling

wigeon

2022年3月14日 11:02

I have some points in pytorch and I would like to sample from a distribution that resembles these points. I noticed that the seaborn kde plots seem to draw out/define a distribution graphically and I was wondering if there was a way to do something similar for sampling purposes. In other words I would like to feed my points into a function that uses them to define/approximate a distribution from which I can sample more points. Is this a feasible …

Topic: density-estimation generative-models pytorch distribution

Category: Data Science

sklearn.neighbors.KernelDensity - score(X) explanation

Arun

2022年3月10日 10:29

For sklearn.neighbors.KernelDensity, its score(X) method according to the sklearn KDE documentation says: Compute the log-likelihood of each sample under the model For 'gaussian' kernel, I have implemented hyper-parameter tuning for the 'bandwidth' parameter using Bayesian-Optimization as follows: # The input data for which 'bandwidth' needs to be tuned- data # (2880, 64) def kde_hyperopt_eval(bandwidth): params = {} params['bandwidth'] = bandwidth # Initialize a KDE model- kde_model = KernelDensity( kernel = 'gaussian', bandwidth = params['bandwidth'] ) # Train KDE model on …

Topic: density-estimation hyperparameter-tuning scikit-learn machine-learning

Category: Data Science

Reverse scaling Synthetic KDE data

Arun

2022年3月7日 10:27

For Python 3.9, sklearn version: 0.24.2 and numpy version: 1.20.3, I am using a Kernel Density Estimation (KDE) generative model. The goal is to generate new data using a given input data. The steps to achieve this involve: Scale input data to be in range [-1, 1] using MinMaxScaler Train KDE model on scaled input data Use trained KDE model to generate new sample/synthetic (scaled) data Use trained scaler from step - 1 to get data back in original scale …

Topic: density-estimation generative-models python-3.x

Category: Data Science

Generating new sample with same distribution

GaRaGe

2022年2月22日 07:46

I have a timeseries data for 1 week. The data contains readings from a device for certain hours of the day. There are about 8-10 readings per day at different timestamps. The timestamps recorded for each day are not necessarily the same. For example, day 1 has timestamps [08:10, 11:50, 13:40, 16:30] and day 2 has timestamps [09:00, 10:50, 14:30, 17:00]. Now i need to generate a new sample of data for an extended period of time (say 2 months). …

Topic: density-estimation distribution sampling time-series python

Category: Data Science

logistic regression or density estimation for binary dependent variable and binary (or categorical) features

andins

2021年7月21日 13:11

I have a binary dependent variable $t$ and categorical features. We can even simplify to binary features since I can one-hot encode the categorical variables. In practice the one-hot encoding induces collinearity in the binary features so for simplicity let's pretend we only have $D$ binary features. The purpose is to estimate the probability of $t=1$. In principle, I can use logistic regression. But, given the categorical nature of the input data they actually define a table of $2^D$ cells. …

Topic: binary-classification density-estimation logistic-regression binary categorical-data

Category: Data Science

Why exactly KNN is outperforming Parzen by a huge margin in classificaton task

Farhood ET

2021年6月12日 06:31

I'm trying to implement a Naive Bayes classifier, which uses either of hypercubic Parzen window or KNN to estimate a density function. The data I'm using is Fashion MNIST. The steps I take are that first I zero center the vectorized data and divide it by its column-wise variance, and then I feed this to a PCA and get a 9 dimension vector. As for the Bayes decisions, for each class of the dataset, I get its samples and train …

Topic: density-estimation k-nn naive-bayes-classifier classification machine-learning

Category: Data Science

MLE for Poisson conditioned on multivariate Gaussian?

olympiader

2021年5月19日 23:46

I am writing some Python code to fit 2D Gaussians to fluorescent emitters on a dark background to determine the subpixel-resolution (x, y) position of the fluorescent emitter. The crude, pixel-resolution (x, y) locations of the pixels are stored in a list xy. The height of the Gaussian represents the predicted pixel intensity at that location. Each 2D Gaussian has 5 parameters, and my end goal is to find the optimal value of those 5 parameters for each peak using …

Topic: poisson multivariate-distribution density-estimation parameter-estimation python

Category: Data Science

How to convert regression into classification?

Nauman Akram

2021年3月8日 20:58

So I have a regression problem with bunch of features X, and labels in the amount (price $). How can I convert it to classification problem? I have read about convert label from continuous to categorical possibly thesholding at some points for instance 0-50 (0 class) 51-100 (1 class) and so on till 500 maybe (here thresh-holding to 50). This approach is ineffective intuitively i.e no data lay 51-100. Is there any way to mitigate this problem or if there …

Topic: density-estimation regression statistics clustering machine-learning

Category: Data Science

How to evaluate KDE against histogram?

Adelson Araújo

2021年2月25日 10:00

I am currently testing some approaches for density estimation, and I think the basic approach of histograms may not be the best option to me and KDE is certainly a good alternative to go. While ago I found a very interesting tutorial by Jake VanderPlas which explains KDE in a nice way. In his tutorial, Jake optimized KDE bandwidth selection using grid search maximizing the log-likelihood of the density estimation given some samples, but that is built-in in sklearn and …

Topic: density-estimation kernel historgram scikit-learn

Category: Data Science

Learn smoothly varying mean and variance of a variable over a 2d domain

A. White

2020年10月11日 23:38

For a problem which I am working on at the moment, I'm interested in learning how the mean and variance of some response variable y changes with two independent variables x1 and x2 - i.e. for each coordinate in (x1, x2)-space I wish to have an estimate for $\mu_y$ and $\sigma_y$ in order to be able to approximately standardise new observations as they arrive. I have enough domain knowledge to expect both the mean and variance of y to vary …

Topic: density-estimation probability statistics machine-learning

Category: Data Science

how can i interpret kernel density plots from classification?

Maths12

2020年7月1日 09:37

all, i have a classification problem where i am predicting likelihood of client defaulting on loan. i plotted the predicted probabilities from my model, and then plotted against the label '1' for default or 0 for non-default. it is cut out here but y axis is the density. am i right to reason that this shows an exponential distribution, or that the fact the class 1 curve has a fat tail it shows that default is an extreme / unexpected …

Topic: density-estimation probability classification

Category: Data Science

Test independence based on Kernel Density Estimation

Dogemore

2019年9月2日 08:52

I am working on a problem where I have a dataset of $X$ is dataset with $(X, Y, T, K)$ four attributes, I'd like to test if $P(X, Y, T)P(K) = P(X, Y, T, K)$, that is if $X, Y, T$ is independent of $K$. I have two questions: Is it possible to use kernel estimation to fit $P(X, Y, T)$ $P(X, Y, T, K)$ and $P(K)$ respectively and test the independence? If I did so, will the output be …

Topic: density-estimation bigdata

Category: Data Science

Simple example of Parzen window (kernel density estimation)

rj487

2019年3月28日 14:17

I am confused about the Parzen Window question. Suppose we have two training data points located at 0.5 and 0.7, and we use 0.3 as its rectangle window width. How do we estimate its probability density? According to the definition, the probability density is $\frac{k/n}{V}$, where $k$ is the number of patterns, $n$ is the total number of points, and $V$ is the window region. Therefore, is the density for this question $(1/2)/1$? Then what if we use a triangle …

Topic: density-estimation unsupervised-learning machine-learning

Category: Data Science

About