Latent variables with thresholds

There are many ML techniques to estimate latent variables such as the EM algorithm. Is there a technique that allows for thresholds for each of the latent variables?

I have a feature space with 10 variables $(X_1,\dots,X_{10})$ and the outcome $Y$. 7 of the $X$ features are known (I have their observations) and 3 are unknown. Each of the unknown can be within a range from 0 up to a positive constant number.

What ML technique would you recommend for estimating the above latent variables with the setup described above?

Topic estimators machine-learning

Category Data Science


Sure. Just treat the range as a prior on the latent variables. Typically we use a boring prior (e.g., a normal distribution, a uniform distribution), but in your case, if $X_7$ is unknown and in the range $[0, 7.3]$, then your prior for $X_7$ could be the uniform distribution on that range. Then apply the machinery of the EM algorithm, and it should all work.


re. "estimate latent variables"

Quantities that are trained in order to fit a "best" model within a family of models are called hyper-parameters. To any instance of the model they are fixed. To the optimisation routine they are an index into search space. Adding constraints on the range of a hyper-parameter both reduces the search space of the optimisation and requires extra "feasibility" checks during typical gradient descent.

A variable is "latent" when it is purely internal to the model, i.e. not an observable. The meaning of its scale would depend on the context and on your interpretation, since it cannot be compared to anything observed. You rarely want to constrain that range inside the model.

I would suggest leaving the hyper-parameters and latent variables unconstrained and if you want to read an output train a "neuron"-like response to get what you want out : e.g. sigmoid / tanh / softmax

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.