How do we make data Obfuscate or "De-identificate" to make it anonymous and share it publicly?
Right now, I am working on preparing a small dataset for release to the public by getting rid of sensitive information. While working on it, I wondered... what are the best practices of dealing private or sensitive polynomial attributes in a dataset?(*) I have heard to create anonymity or to permute is achieved by De-identification, obscurification, anonymization.
However, I would like to learn more about this topic in data science/analysis. I am particularly interested in the packages and concepts that can be use in R.
(*)Well besides the obvious of solutions of completely removing the sensitive attribute or using hashcode encryption. I am a little bit familiar with how this problem can be complicated by the ability to correlate attributes
Topic anonymization dataset r
Category Data Science