I am analysing a bunch of data files which represent responsiveness of cells to addition of a drug. If a drug is not added, cell responds normally, if it is added, it shows abnormal patterns: , . We decided to analyse this using an amplitude histogram, in order to distinguish between a change in amplitude and in change of a probability of elliciting the binary response. What we get with file 1 is : So we fit a pdf on …
I am currently trying to write a simple multivariate gaussian mixture model using tensorflow probability. Specifically, I have some 2-dimensional input and 2-dimensional output data and am looking to produce a probabilistic model that best fits this data by using a neural network. I am utilizing tfp.layers.MixtureSameFamily to generate this model, which is working perfectly well as expected. Here is my code for doing so: Notes: I am using tensorflow 2.4.1 and tensorflow-probability 0.12.1 x = some 2-d input data …
Are there theoretical or empirical reasons for drawing initial weights of a multilayer perceptron from a Gaussian rather than from, say, a Cauchy distribution?
I was reading this article where I came across the following statement in the context of "Why do we use sigmoid activation function in Neural Nets?": The assumption of a dependent variable to follow a sigmoid function inherently assumes a Gaussian distribution for the independent variable which is a general distribution we see for a lot of randomly occurring events and this is a good generic distribution to start with. Could someone elaborate on this relationship between the two?
I'm new to machine learning. I have the following scenario: I have five individuals that are each carrying an accelerometer. That sensor measures movement/acceleration on a scale from 0 to 255, 0 being no movement, 255 being max movement (at a 5-minutes interval). Some individuals carry sensors that are more sensitive, and some that are less sensitive. As such, some individuals' sensors will provide higher values, and some individuals' sensors will provide lower values, for the same movements. Using a …
I am trying to generate a complex Gaussian white noise, with zero mean and the covariance matrix of them is going to be a specific matrix which is assumed to be given. Assume i to be a point on the grid of x axis, where there are N points on the axis. The problem is to generate a complex valued random noise at each point (let's call the random value at the point i as $y_i$), which obeys Gaussian distribution …
In the last part of Andrew Ng's lectures about Gaussian Discriminant Analysis and Naive Bayes Classifier, I am confused as to how Andrew Ng derived $(2^n) - 1$ features for Naive Bayes Classifier. First off, what does he mean by features in the context he was describing? I initially thought that the features were characteristics of our random vector, $x$. I know that for the total possibilities of $x$ it is $2^n$ but I do not understand how he was …
From prof. Andrew Ng's Multivariate Gaussian distribution lecture, covariance measures linear dependency between features, in which case we might use Multivariate Gaussian distribution with covariance matrix. And also, if features are redundant (for ex: x1= 2 * x2; clearly linear dependency exists between features), covariance matrix is not invertible and can't use Multivariate Gaussian distribution with covariance matrix. For me, these statements looks contradictory. Question: Whats difference between covariance - linear dependency and features linear dependency?
I am getting the following error when running a Gaussian Mixture Model: ValueError: Fitting the mixture model failed because some components have ill-defined empirical covariance (for instance caused by singleton or collapsed samples). Try to decrease the number of components, or increase reg_covar. The matrix that I am using has a relatively large shape so it will be hard to display on this page, however, here is an overview [[ 6.10086000e+05 1.58787000e+05 0.00000000e+00 ..., 8.00000000e+00 0.00000000e+00 0.00000000e+00] [ 2.36273000e+05 1.48953000e+05 …
I use DeLonge method to compare two ROC AUCS. The result of it is Z-score. Both ROC AUCs obtained from LDA (linear discriminant analysis) from sklearn package. The first one uses eigen solver inside LDA and the second one uses svd solver. The dotted line is my data. The red line is N(0, 1) Note: there is a minor jump at the point Z = 0. Z = 0 means that classifiers did their job equally. Z > 0 (Z …
I have a classification problem with data that comes in pairs. A pair consists of two datapoints (A,B) or (B,A), each datapoint containing 20 features. After receiving about 30 pairs, my goal is to separate the A and B classes using a GMM using feature similarity. For each datapoint, it is not known beforehand to what class it belongs, but it is however known that is of the opposite class as the other datapoint in its pair. Is there any …
I am trying to solve/understand ASR using HMM-GMM. At the abstract level i do understand what's happening but I did not understand how GMM fits into it. My data has 5K hours of speech from single user. I took the above picture from this article. I do know what is GMM but i am unable to wrap my head around it. Can somebody explain with a simple example.
I have a larger dataset (random variable) 'x' containing values approximating a Gaussian distribution. From 'x', a much smaller random variable 'y' is sampled without replacement. I want to compare their distributions using histograms. The code in Python 3.9 is as follows: # Create a gaussian distribution- x = np.random.normal(loc = 0, scale = 2.0, size = 20000000) # Sample from 'x' without replacement- y = np.random.choice(a = x, size = 400000, replace = False) x.size, y.size # (20000000, 400000) …
I wanted to use Discrete Latin Hypercube to sample 20 samples from a space created by an array that has 4096 values. So my desired output would be something like the following: NSample = 20 Samples = lhc(BigArray, NSample) Where the Big Array is the array with 4096 elements. Is there a way to do this? I haven't found anything similar to this. How can I do this in python?
When learning about t-SNE, I found a resource saying "width of the normal curve (a gaussian centered at $x_i$) depends on the density of data near the point of interest". Which is why we do the normalization with $\sum_{k\neq i} e^{(-||x_i - x_k||^2/2\sigma^2)}$ in $p_{j|i}= \frac { e^{(-||x_i - x_j||^2/2\sigma^2)}} {\sum_{k\neq i} e^{(-||x_i - x_k||^2/2\sigma^2)}}$. I know that the gaussian's width depends on the variance, ${\sigma}^2$. However there was no mention of calculating the variance and I read that variance …
I am implementing a HMM (Hidden Markov Model) for time series data to assess (define) state of the model at each time period (with continuous observations). To maximize the likelihood of the model (given an observed data, the probability of hidden state the model is in at each data point) I used expectation maximization algorithm (in case of hmm - Baum-Welch algorithm). The problem is that in case of multidimensional data (the observation at each time is a vector), defined …
HMM is a statistical model with unobserved (i.e. hidden) states used for recognition algorithms (speech, handwriting, gesture, ...). What distinguishes DHMM form CHMM is the transition probability matrix P with elements. In CHMM, state space of hidden variable is discrete and observation probabilities are modelled as Gaussian distributions. Why are observation probabilities modelled as Gaussian distributions in CHMM? Why are they (best)distributions for recognition systems in HMM?
When I was reading about GAN, the thing I don't understand is why people often choose the input to a GAN (z) to be samples from a Gaussian? - and then are there also potential problems associated with this?
I was reading a blog on diffusion models where I came across this expression. I didn't understand why it is \begin{align} \sqrt[]{1-\beta \small{t}}*\large{x}\small{t-1} \end{align} and what exactly the term variance schedule \begin{align} {\beta \small{t} \space \large\epsilon (0,1) } t = 1 \space to \space T \end{align} signifies Blog link
Say I have a time series (e.g. bitcoin price). I want to predict tomorrow's price, specifically tomorrow's % change in price from today. Let's say this is gaussian distributed, with the mean at 0%. If the market is trending up, the price prediction should be higher (e.g. +3.1%). If the market is trending down, the price prediction should be lower (e.g. -5.4%). If the market is trending sideways, the price prediction should be neutral (e.g. 0%). However, there are times …