Should I generalize categorical features if the algorithm handles over-fitting well?

I'm referring to Kaggle feature creation exercise . The data frame contains a column(MSSubClass) that contains these unique values:

   'One_Story_1946_and_Newer_All_Styles', 
   'Two_Story_1946_and_Newer',
   'One_Story_PUD_1946_and_Newer',
   'One_and_Half_Story_Finished_All_Ages', 
   'Split_Foyer',
   'Two_Story_PUD_1946_and_Newer', 
   'Split_or_Multilevel',
   'One_Story_1945_and_Older', 
   'Duplex_All_Styles_and_Ages',
   'Two_Family_conversion_All_Styles_and_Ages',
   'One_and_Half_Story_Unfinished_All_Ages',
   'Two_Story_1945_and_Older', 
   'Two_and_Half_Story_All_Ages',
   'One_Story_with_Finished_Attic_All_Ages',
   'PUD_Multilevel_Split_Level_Foyer',
   'One_and_Half_Story_PUD_All_Ages'

and they generalize the values into following values:

'One', 'Two', 'Split', 'Duplex', 'PUD'

(by splitting from the first word).

Should this kind of generalization is needed if I only use Random forests as my algorithm to make predictions?

It seems this kind of generalization losses some amount of information from the data. Also random forests are good at handling over-fitting.

Topic generalization feature-engineering random-forest

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.