Why label encoding before split is data leakage?
I want to ask why Label Encoding before train test split is considered data leakage?
From my point of view, it is not. Because, for example, you encode good to 2, neutral to 1 and bad to 0. It will be same for both train and test sets.
So, why do we have to split first and then do label encoding?
Topic test labelling data-leakage training preprocessing
Category Data Science