Knowing Joint probability distribution between feature-label space
I am doing a course CORNELL CS4780 "Machine Learning for Intelligent Systems". you can find the link here for the one I am going to refer 1st lecture
The professor explains, we have a sample
$D ={ (X_1,y_1),(X_2,y_2), \ldots,(X_n,y_n)} \sim P$ Where, (Xi,yi) is a feature-label pair. There is a joint distribution over the feature-label space and is denoted by $P$.
We never have access to the $P$, Only God knows $P$. What we want to do in this supervised learning task is to take data from this distribution and learn a mapping/function form $X$ to $y$.
I agree/understand till this point.
Then, Professor goes on to make a statement in the lecture, precisely at 34 minutes 26 seconds, that
"IF we had access to this distribution, everything would be easy". But he doesnt explain this statement.
Now my question is What would have been easy if we knew about the distribution ? Does he mean, if we had access to the distribution then we would know the probabilities of each of $(X_i,Y_i)$ pair. Then we can learn a mapping/parameters such that we reduce out of sample error?
Topic learning supervised-learning statistics machine-learning
Category Data Science