Knowing Joint probability distribution between feature-label space

Question

Knowing Joint probability distribution between feature-label space

v09

2020年1月17日 08:10

I am doing a course CORNELL CS4780 "Machine Learning for Intelligent Systems". you can find the link here for the one I am going to refer 1st lecture

The professor explains, we have a sample

$D ={ (X_1,y_1),(X_2,y_2), \ldots,(X_n,y_n)} \sim P$ Where, (Xi,yi) is a feature-label pair. There is a joint distribution over the feature-label space and is denoted by $P$.

We never have access to the $P$, Only God knows $P$. What we want to do in this supervised learning task is to take data from this distribution and learn a mapping/function form $X$ to $y$.

I agree/understand till this point.

Then, Professor goes on to make a statement in the lecture, precisely at 34 minutes 26 seconds, that

"IF we had access to this distribution, everything would be easy". But he doesnt explain this statement.

Now my question is What would have been easy if we knew about the distribution ? Does he mean, if we had access to the distribution then we would know the probabilities of each of $(X_i,Y_i)$ pair. Then we can learn a mapping/parameters such that we reduce out of sample error?

Topic learning supervised-learning statistics machine-learning

Category Data Science

Siong Thye Goh · Accepted Answer · 2020年1月17日 08:09

The interest is to predict $y$.

If we know the real distribution $P$ of $(x,y)$, there is no need to build any more machine learning model. Given $x$, we can directly consult $P$ to know the probability of $P(y|x)$. For the case of discrete $Y$, we have $P(y|x)=\frac{P(x,y)}{\sum_z P(x,z)}$.

Knowing Joint probability distribution between feature-label space

About