Influence of imbalanced feature on prediction
I want to use XGB regression. the dataframe is coneptually similar to this table:
index feature 1 feature 2 feature 3 encoded_1 encoded_2 encoded_3 y
0 0.213 0.542 0.125 0 0 1 0.432
1 0.495 0.114 0.234 1 0 0 0.775
2 0.521 0.323 0.887 1 0 0 0.691
My question is, what is the influence of having imbalanced observations of the encoded features? for example, is I have more features that are encoded 1 comapred to encoded 2 or encoded_3. Just to make it clear, I want to use regression and not classification.
If there is any material to read about it pelase let me know.
Topic imbalanced-data xgboost regression python
Category Data Science