Use distribution probability as a feature in ML model

I built an LSMT model to predict sick cows. I also have risk factors like cow size and height (static risk factor) that I want to combine into the ML model. I found that size is geometrically distributed. My question is how I insert it as a feature to the model? I know that $P(x=K)= p*q^(k-1)$ but I don't know how to combine it as a feature. Thank you.

Topic probability deep-learning python distributed

Category Data Science


Using a probability distribution as a feature is not possible in most commonly-used machine learning frameworks. Most commonly-used machine learning frameworks only accept scalar-like values as inputs. In the case of height, it would be a single numeric measurement.

If you willing to go outside of established frameworks, you could model the problem in a Bayesian way with probabilistic programming where all quantities are distributions.


As a general approach I would say you need to generate new features, that use your prior knowledge. For example, if you have a known size distribution, then for each specific size you can calculate its probability and use it as a new feature.

As I side-note, the geometric distribution of cow sizes seems very surprising to me, I would expect to see some gamma distribution or just normal (if size is measured in cm/inches).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.