expectation-maximization

How would you describe cluster 2 from this output of a run of the EM program?

Shroomy

2022年5月28日 23:00

My description: Cluster 2 consists of 9511 instances, the age is around 42 (ranges between 29.7207 and 54.5257). Considering Age, Cluster 2 is very well separated from Cluster 1, with a distance of 18.9513. On the other hand, Cluster 2 and Cluster 0 are very close though, their centroids are withihn a distance of around 0.8248. What else could be added?

Topic: expectation-maximization clustering data-mining machine-learning

Category: Data Science

Is there a Gaussian Mixture Model for data with opposing pairs?

Simon van Eeden

2022年3月29日 19:58

I have a classification problem with data that comes in pairs. A pair consists of two datapoints (A,B) or (B,A), each datapoint containing 20 features. After receiving about 30 pairs, my goal is to separate the A and B classes using a GMM using feature similarity. For each datapoint, it is not known beforehand to what class it belongs, but it is however known that is of the opposite class as the other datapoint in its pair. Is there any …

Topic: gmm gaussian expectation-maximization classification clustering

Category: Data Science

How to derive Evidence Lower Bound in the paper "Zero-Shot Text-to-Image Generation"?

p1p13

2021年6月12日 11:20

Can someone share the derivation of Evidence Lower Bound in this paper ? Zero-Shot Text-to-Image Generation The overall procedure can be viewed as maximizing the evidence lower bound (ELB) (Kingma & Welling, 2013; Rezende et al., 2014) on the joint likelihood of the model distribution over images x, captions y, and the tokens z for the encoded RGB image. We model this distribution using the factorization ${p_\theta,_\psi(x, y, z) = p_\theta(x | y, z)p_\psi(y, z)}$, which yields the lower bound: …

Topic: openai-gpt autoencoder probability expectation-maximization deep-learning

Category: Data Science

Which latent variable model is better to find hidden variable?

Mohammad Saad

2021年4月25日 14:20

Currently, I am exploring the concept of latent variable for regression type datasets. I have gone through literature of few of the methods and models used in finding latent variable, like: EM algorithms, Partial least square regression, Latent semantic analysis, Mixed Effect models (linear-nonlinear), HMM, and there are many more!! For Example: volume DataFrame head is length width volume 0 1.395702 4.822958 40.821677 1 5.761620 9.912682 242.571731 2 3.444930 2.111199 18.904144 3 6.236642 7.609429 425.838818 4 7.270517 1.106117 39.883937 In …

Topic: markov-hidden-model feature-engineering expectation-maximization

Category: Data Science

Active learning with mixture model cluster assignments - am I injecting bias here?

goopy

2021年4月1日 15:01

Suppose I have a dataset of people's phone numbers and heights, and I'm interested in learning the parameters $p_{girl}$, $p_{boy}=1-p_{girl}$, $\mu_{boy}$, $\mu_{girl}$, and overall $\sigma$ governing the distribution of peoples' heights. I don't have labels for boys or girls yet, but if I really want to, I can call the phone number and ask if the person is a boy or girl. Procedure: Fit a Gaussian mixture model to heights via EM. Assign the greater of the $\mu$s to be …

Topic: bias missing-data expectation-maximization active-learning clustering

Category: Data Science

How to find the feature regions where each label is the most expected when using decision trees?

lalaland

2020年10月18日 13:26

Given a decision tree for classification for example this one: What is the way to find the feature domain (petal and sepal width and length) where a sample would most likely occur in the feature space for each class? It is clear here that for Setosa it is when petal length is less or equal to 2.45. However, where I am confused is how to think in more complex cases. For example, let's take Versicolor: I am hesitating between 2 …

Topic: multilabel-classification expectation-maximization decision-trees classification feature-selection

Category: Data Science

Why Gaussian mixture model uses Expectation maximization instead of Gradient descent?

Aj_MLstater

2020年8月6日 19:02

Why Gaussian mixture model uses Expectation maximization instead of Gradient descent? What other models uses Expectation maximization to find best optimal parameters instead of using gradient descent?

Topic: gmm gaussian expectation-maximization gradient-descent clustering

Category: Data Science

E-step for EM algorithm for document clustering

t.wangd

2020年6月12日 08:52

I have a code for the E-step in the EM algorithm for Document Clustering in the version of hard-EM algorithm. I'm trying to implement the E-step for soft-EM algorithm. Here is my code for Hard-EM: E.step <- function(gamma, model, counts){ N <- dim(counts)[2] # number of documents K <- dim(model$mu)[1] for (n in 1:N){ for (k in 1:K){ gamma[n,k] <- log(model$rho[k,1]) + sum(counts[,n] * log(model$mu[k,])) } logZ = logSum(gamma[n,]) gamma[n,] = gamma[n,] - logZ } gamma <- exp(gamma) return (gamma) …

Topic: expectation-maximization r machine-learning

Category: Data Science

N-Gram Linear Smoothing

chikitin

2019年9月26日 18:15

In slide 61 of the NLP text, to smooth out the n-gram probabilities, we need to find the lambdas the miximazies a probability to held-out set given in terms of M(λ1, λ2, ...λ_k). What does this notation mean please? Also, it says that "One way is to use the EM algorithm, an iterative learning algorithm that converges on locally optimal λs". Can someone refer me to a good example? Say the training text is "Sam I am Sam I do …

Topic: expectation-maximization nlp

Category: Data Science

Gaussian Mixture Models Clustering

YCCCCC

2019年9月15日 16:23

When using the EM algorithm in Gaussian Mixture Models (GMM), in the E-step, we take each x set in the training dataset to calculate and update the "weight" and parameters of each Gaussian distribution of the clusters (M-step). I have read that we do this until it converges. I am a little confused here. Does that mean it loops through the whole training dataset X every time in "one step" of the EM algorithm? Or is "one step" corresponding to …

Topic: gmm data-science-model expectation-maximization optimization machine-learning

Category: Data Science

Best python library for training using Hidden Marov model with Gaussian Mixture

Mari

2019年8月21日 10:40

I would like to train my data using HMM- GMM (Baum Welch approach with gaussian Mixture) to find the best parameters suited for my data. Note : My data is continuous and not discrete. I tried with hmmlearn from scikit learn, but i believe it is not supporting continuous HMM-GMM model, but i tried with discrete data, it woks fine. I tried to use pomegranate, but i cannot able to understand the document, also i am not sure whether it …

Topic: markov-hidden-model expectation-maximization markov-process python

Category: Data Science

Does feature normalization improve performance of Hidden Markov Models?

hazrmard

2019年7月2日 23:06

For training a Hidden Markov Model (HMM) on a multivariate, continuous time series, is it preferable to scale the data somehow? Some pre-processing steps may be: Normalize to 0-mean and unit-variance Scale to [-1, 1] interval Scale to [0, 1] interval With neural networks, the rationale behind scaling is to get an "un-squished" error surface that is easier to navigate in. HMMs use the Baum-Welch algorithm, which is a variation on the Expectation Maximization (EM) algorithm, to learn parameters. Is …

Topic: markov-hidden-model normalization expectation-maximization preprocessing

Category: Data Science

Can Expectation Maximization estimate truth and confusion matrix from multiple noisy sources?

Brendan Hill

2018年11月21日 04:04

Suppose we have $m$ sources, each of which noisily observe the same set of $n$ independent events from the outcome set $\{A,B,C\}$. Each source has a confusion matrix, for example for source $i$: $$C_i = \begin{bmatrix} 0.98 & 0.01 & 0.07 \\ 0.01 & 0.97 & 0.00 \\0.01 & 0.02 & 0.93\end{bmatrix} $$ where each column relates to the truth, and each row relates to the observation. Eg. if the true event is $B$ then source $i$ will get it …

Topic: parameter-estimation expectation-maximization confusion-matrix data-mining

Category: Data Science

How to interpret the mean for output clusters for expected-maximization?

Suhail Gupta

2018年10月16日 17:58

I am trying to cluster data using scikit's expectation-maximization. So I created two different data sets from a normal distribution which is I have shown in the graph below. The mean for each of the distribution is: Mean of distr-1: 0.0037523503071361197 Mean of distr-2: -0.4384554574756237 But after I run the EM using scikit, I get the mean as follows: Mean after EM: [[-0.12327634 0.39188704] [-1.31191255 -4.4292102 ]] How am I supposed to interpret this mean? I am trying to create …

Topic: expectation-maximization scikit-learn python clustering

Category: Data Science

Code or Package to cluster sequences (or time series) of different lengths based on HMM?

mflowww

2018年10月10日 17:01

Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers? 1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997): https://papers.nips.cc/paper/1217-clustering-sequences-with-hidden-markov-models.pdf The paper gives a probabilistic model-based approach to clustering sequences (or time series), using hidden Markov models (HMM). 2) 'Visual Cluster Exploration of Web Clickstream Data' by Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma (2012): http://www.cs.tufts.edu/comp/250VIS/papers/VAST2012-ClickStream.pdf The paper is quite …

Topic: markov-hidden-model expectation-maximization sequence clustering open-source

Category: Data Science

Statistical machine translation word alignment for FR-ENG and ENG-FR: what is p(e) and p(f)?

mudkipium

2018年9月24日 17:01

I'm currently trying to implement this paper, but am struggling to understand some of the math here. I'm pretty sure I understand how to implement the E-step, but for the M-step, I'm confused on how to compute the M-step. It says just before section 3.1 that $p_1(x, z; \theta_1) = p(e)p(a, f|e; \theta_1)$, and then the same for $p_2$ but with $e$ and $f$ swapped. The second part of this makes sense to me, but what is $p(e)$ or $p(f)$? …

Topic: markov-hidden-model probability expectation-maximization machine-translation machine-learning

Category: Data Science

EM clustering with missing and misspelling data

roy

2018年7月19日 18:02

I am currently working on a project that requires me to cluster the unlabeled input. The records contain personal information such as name, DOB, height, sex, etc. We need to cluster the same person in one group, here is the sample data: +------------------------------------+ | Record1 Record2 | +------------------------------------+ | First Name 'Harry' 'Harry' | | Middle Name 'Jay' 'J' | | Last Name 'Potter' 'Potter' | | DOB Month 1 1 | | DOB Day 1 1 | | DOB …

Topic: expectation-maximization clustering machine-learning

Category: Data Science

Does K-Means' objective function imply distance metric is Euclidean

Logan

2018年2月28日 01:13

The objective/loss function of K-Means algorithm is to minimize the sum of squared distances, written in a math form, it looks like this: $$J(X,Z) = min\ \sum_{z\in Clusters}\sum_{x \in data}||x-z||^2$$ If we have different distance metric, for instance, cosine (I realize there's a conversion between cosine and Euclidean but let's forget it for now), manhattan etc, does it mean we will have a different loss function? That is, the traditional K-Means based on expectation maximization won't be working right? Because …

Topic: expectation-maximization k-means clustering machine-learning

Category: Data Science

How to compare the performance of different number of mixing components for EM algorithm?

San Yeung

2017年4月29日 09:44

I am reading about the EM (Expectation-Maximization) algorithm in a machine learning book. At the end remark of the chapter, the authors mentioned that we cannot decide the "optimality" of the number of components (# of mixtures of Gaussians distributions) based on each model's log likelihood at the end--since models with more parameters will inevitably describe the data better. Therefore, my questions are 1) How do we compare the performance of each model using a different number of components? 2) …

Topic: expectation-maximization clustering data-mining machine-learning

Category: Data Science

Hidden Markov Models: Linking states to labels after EM training

lo tolmencre

2017年4月2日 15:13

The tl;dr version first: I have the following problem: I implemented Baum Welch for ergodic HMMs. I do it like this: I pass the model two number C1 and C2, it builds a fully connected state machine with C1 states and C2 emissions. I map all tokens from my training data onto the range [0, C2) and each label the HMM is supposed to assign a token during inference onto [0, C1). Then the HMM goes ahead and does Baum …

Topic: probability unsupervised-learning expectation-maximization language-model machine-learning

Category: Data Science

About