How to write custom de-identification algorithm in Python?

I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data sample is available here

import pandas as pd 
import uuid as u 
import datetime as dt 

# generate a pseudo-identifier sequesnce using python random number generator library uudi.


 def uudi_generator(length): 
    uudi_list= list() 
    i=0 
    while i  length: 
        uudi_list.append(u.uuid4()) 
    i+=1 
    return uudi_list 

#import original originaL dataset 
dataset = pd.read_csv('bankcredit-data.csv') 

# pseudo identifier
sLength = len(dataset['housing']) 
dataset.insert(0, 'uuid', pd.Series(uudi_generator(sLength), index=dataset.index)) 

# Transaction record attached to the original
dataset.insert(0, 'transaction_date', pd.Series([dt.datetime.now]*sLength, index=dataset.index)) 

 #transcation record is attached to originaL data file 
dataset.to_csv('bankcredit-data.csv') 

#delete identifiabLe record from dataset 
del dataset['firstnamme'] 
del dataset['lastname'] 

# export  de-identified dataset as csv to be shared with the user
dataset.to_csv('deidentified-data.csv')

Topic privacy implementation anonymization dataset python

Category Data Science


Unless you want to build your own, try the Faker Library for anonymity of PPI info.

pip install Faker

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.