Data anonymization in Python
I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before implementing the machine learning algorithms. `The data anonymization includes:
changing the names of persons,
places,
geographical locations, etc.
I would like to know what are the best practices for anonymizing datasets? Ideally, I should be able to get the original data back after performing analysis on the anonymized dataset.
I went through the literature and looked over some answered questions. They all are based on cybersecurity aspects
like encryption and decryption algorithms
. I am not familiar with cybersecurity algorithms. Is there any way to slightly change the data without digging into cybersecurity algorithms?
Topic data anonymization python data-cleaning machine-learning
Category Data Science