Should hexadecimal addresses of a dataset be cleaned?
I am working on fraud detection on blockchains. To be more specific, I fetched a big number of transactions that took place on the blockchain, labeled them to spam / non spam using an appropriate API and now I will train a model to detect fraud using SVM, etc ...
My question is about the preparation of the data. The fields I have are : hash, nonce transaction_index, from_address, to_address,...
The fields from/to_address are hexadecimal fields like 0x5e14d30d2155c0cdd65044d7e0f296373f3e92f65ebd
My question is, how should I format this data ? Should I delete this field ? ( I do not think so since it is very relevant to the problem at hand ). I can't find the appropriate encoding, neither.
Topic dataframe classification python
Category Data Science