Using scikit-learn FeatureHasher
I have a huge data set with one of the columns named 'mail_id'. The mail_id is given in a very creepy format as shown below:
mail_id
DQ/4I+GIOz2ZoIiK0Lg0AkwnI35XotghgUK/MYc101I=
BL3z4RtiyfIDydaRYWX2ZXL6IX10QH1yG5ak1s/8Lls=
BL3z4RtiyfIDydaRYWX2ZXL6IX10QH1yG5ak1s/8Lls=
EHNBRbi6i9KO6cMHsuDPFjZVp2cY3RH+BiOKwPwzLQs=
K0y/NW59TJkYc5y0HUwDeAXrewYT0JQlkcozz0s2V5Q=
UGATDXARg7jMEInKH7oXgty2nwxnwD2l0OW/8Nsa0MI=
qE9zgWiITYA97RUiN4X/t9IVWLViLz+lKijaYegyBiQ=
BL3z4RtiyfIDydaRYWX2ZXL6IX10QH1yG5ak1s/8Lls=
4+EEK8RbNYwuFCHznY9XSRCV4Yek60bHVgnP3jtjjzk=
After doing a lot of analysis on my data, I have found that I cannot drop this feature set from my model so I have to convert it to something meaningful. Can anyone please explain me how to do this efficiently?