anonymization

Evaluation of the preprocessing to make a dataset anonymous

John Angelopoulos

2022年5月6日 12:41

I have a very huge dataset from the NLP area and I want to make it anonymous. Is there any way to check if my pre-processing is correct? Generaly, is there any way to evaluate how good is the pre-processing for the anonyminity? I want to mention that the dataset is really huge, therefore it can be cheched manually.

Topic: anonymization nlp python

Category: Data Science

Data privacy breach example (data anonymisation)

Qwerty

2022年4月14日 20:14

I remember that I read a story where journalists were able to figure out the health records of some individual (I think it was some senator, but not sure) by using different data sets. That was an example showing that data anonymization is not sufficient for data privacy. However, I can't find this story on the Internet. Did anyone come across this story? Update: I found it.

Topic: privacy data anonymization machine-learning

Category: Data Science

How to write custom de-identification algorithm in Python?

Muhammad Ali

2022年1月31日 20:07

I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data sample is available here import pandas as pd import uuid as u import datetime as dt # generate a pseudo-identifier sequesnce using python random number generator library uudi. def uudi_generator(length): uudi_list= list() i=0 while i < length: uudi_list.append(u.uuid4()) i+=1 return uudi_list #import original originaL dataset dataset …

Topic: privacy implementation anonymization dataset python

Category: Data Science

Data Anonymization for all domains?

The Great

2021年12月3日 11:44

I am using a dataset from Marketing and sales department. The dataset contains customer name (company name), company address, pincode, no of orders placed, revenue generated from that customer etc. My question is whether I should hide/mask/anonymize customer name and address etc? Of course, the insights that we generate will be used by the business users from sales and marketing team. So, should we use a duplicate identifier (mapping sheet) to indicate the customer names and address etc. For ex: …

Topic: data anonymization deep-learning dataset machine-learning

Category: Data Science

Does data anonymization conflict with GDPR rules?

thinwybk

2021年9月19日 19:30

There are GDPR articles that relate to a person's ownership of their data e.g., Art. 17 GDPR Right to erasure (‘right to be forgotten’) and Art. 20 GDPR Right to data portability. In case one would anonymize the data without a way to "restore" the relation between the person (name + e-mail address) (which in turn would allow handling of the person-specific data), I'd say this would conflict with these GDPR articles. Are there data anonymization techniques that allow to …

Topic: anonymization

Category: Data Science

Can longitudinal studies be completely anonymous?

Wolter

2021年9月1日 09:00

A regular digital questionnaire can be completely anonymous, by sending out a non-personalized URL for the questionnaire and not asking or storing identifiable information (such as the users IP address or asking questions about date of birth, etc.). By this I mean, as the researcher, I am unable to later identify who filled out a questionnaire, even if I wanted to. I now have a longitudinal study, with 4 waves of questionnaires, one year apart each. Consecutive waves are required …

Topic: anonymization

Category: Data Science

How do you choose an appropriate $k$ to achieve $k$-anonymity for data?

kevins

2021年5月18日 15:14

How do you choose an appropriate $k$ to achieve $k$-anonymity for a data? What methods exist that are agnostic to the business context for the problem?

Topic: anonymization

Category: Data Science

How to identify a field as holding personal identifiable information from the name of the field itself using ML model in python?

alim1990

2020年11月26日 11:07

Is it possible to automatically detect fields holding personal information (name, phone, address, SSN, passport, gov ID...) from its names, using python in order to upload datasets into the cloud after encrypting or anonymizing the PII fields? I am open to do my own model by training it on a dataset that holds thousands of fields and each one is classified whether personal or not. But apparently I can't find any related datasets.

Topic: anonymization python machine-learning

Category: Data Science

How to protect data from internal data scientists?

Ahmedn1

2020年8月15日 17:59

In our company we want to protect data privacy internally. Meaning, we want to find a way to anonymize the data so the data science team members cannot expose it and yet still can use it for modelling. I googled and read about Pseudonymization. But I mean, is it destroying the data? I didn't find any reliable source using it practically.

Topic: anonymization

Category: Data Science

Data anonymization in Python

Muhammad Ali

2019年11月10日 08:02

I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before implementing the machine learning algorithms. `The data anonymization includes: changing the names of persons, places, geographical locations, etc. I would like to know what are the best practices for anonymizing datasets? Ideally, I should be able to get the original data …

Topic: data anonymization python data-cleaning machine-learning

Category: Data Science

How to release datasets with fingerprinting

DataAnon

2018年5月5日 06:49

I intend on monetising some large datasets. These datasets are anonymised and released to (paying) clients via a web api. Are there any standard algorithms such that if the datasets are intentionally leaked publicly, the data can be altered such that the responsible party can be identified, while at the same time the data remains practically useful? There are certain approaches which come to mind, such as every client's data being very slightly different with known changes. For example in …

Topic: data anonymization

Category: Data Science

How do we make data Obfuscate or "De-identificate" to make it anonymous and share it publicly?

mlane

2018年1月5日 11:08

Right now, I am working on preparing a small dataset for release to the public by getting rid of sensitive information. While working on it, I wondered... what are the best practices of dealing private or sensitive polynomial attributes in a dataset?(*) I have heard to create anonymity or to permute is achieved by De-identification, obscurification, anonymization. However, I would like to learn more about this topic in data science/analysis. I am particularly interested in the packages and concepts that …

Topic: anonymization dataset r

Category: Data Science

Anonymizing data

william007

2017年2月6日 07:52

In https://www.kaggle.com/c/santander-product-recommendation/data it mentions that Please note: This sample does not include any real Santander Spain customers, and thus it is not representative of Spain's customer base. What are the ways where the Santander can anonymize their customers yet the solutions by Kaggle can be useful for them?

Topic: anonymization

Category: Data Science

How can I transform names in a confidential data set to make it anonymous, but preserve some of the characteristics of the names?

Air

2015年12月7日 17:44

Motivation I work with datasets that contain personally identifiable information (PII) and sometimes need to share part of a dataset with third parties, in a way that doesn't expose PII and subject my employer to liability. Our usual approach here is to withhold data entirely, or in some cases to reduce its resolution; e.g., replacing an exact street address with the corresponding county or census tract. This means that certain types of analysis and processing must be done in-house, even …

Topic: anonymization data-cleaning

Category: Data Science

Name Anonymization Software

Stumbler

2015年1月9日 00:18

Although I have seen a few good questions asked about data anonymization, I was wondering if there were answers to this more specific variant. I am seeking a tool (or to design one) that will anonymize human names from a specific country: particularly first names in unstructured text. Many of the tools that I have seen have considered the wider dimensions of data anonymization; with an equal focus on dates of birth, addresses, etc. An imperative aspect is that it …

Topic: anonymization text-mining

Category: Data Science

About