Data Anonymization for all domains?

Question

Data Anonymization for all domains?

The Great

2021年12月3日 11:44

I am using a dataset from Marketing and sales department. The dataset contains customer name (company name), company address, pincode, no of orders placed, revenue generated from that customer etc.

My question is whether I should hide/mask/anonymize customer name and address etc?

Of course, the insights that we generate will be used by the business users from sales and marketing team.

So, should we use a duplicate identifier (mapping sheet) to indicate the customer names and address etc.

For ex: Company A is indicated as 101, Company B is indicated as 321 etc. Some random identifiers and that mapping file will be maintained by the business users (from sales and marketing department). Or is it not necessary to anonymize the data?

Can share me your suggestions on when to and when not to anonymize the data?

I know in healthcare, we have individual level data (patient centric data) and they are very sensitive, so we mask them using identifiers. But does the same apply in sales and marketing domain as well? Are the company name, address, revenue generated etc should be treated as confidential?

Topic data anonymization deep-learning dataset machine-learning

Category Data Science

prashant0598 · Accepted Answer · 2021年12月3日 11:44

Any type of data that contain personal information of individuals need to be anonymized.

We should anonymize data if it has exposure to following disclosure risks:

Identity disclosure occurs if an intruder is able to associate a record of the released dataset with the individual it describes.
Attribute disclosure occurs if an intruder is able to infer the value of a confidential attribute of an individual with enough accuracy.

Resources:

Data Anonymization for all domains?

About