Advantages to combining similarly-named columns for supervised ML?
Is there any benefit to combining similarly named columns either for an improvement in accuracy or for speeding up training/prediction in case of logistic regression, random forest or neural network models?
I have seen this done at times but wasn't sure if there was more than a heuristically-motivated reason for doing it.
eg. Converting this:
name | col1 | col2 | col3 | time |
---|---|---|---|---|
gina | 5 | 12 | 20 | 30 |
john | 6 | 7 | 43 | 40 |
to this:
name | (col1,col2,col3) | time |
---|---|---|
gina | (5,12,20) | 30 |
john | (6,7,43) | 40 |
Topic data-wrangling supervised-learning accuracy
Category Data Science