How to identify a field as holding personal identifiable information from the name of the field itself using ML model in python?

Is it possible to automatically detect fields holding personal information (name, phone, address, SSN, passport, gov ID...) from its names, using python in order to upload datasets into the cloud after encrypting or anonymizing the PII fields?

I am open to do my own model by training it on a dataset that holds thousands of fields and each one is classified whether personal or not. But apparently I can't find any related datasets.

Topic anonymization python machine-learning

Category Data Science


In such cases, where the data is not availible, it will be better to make the dataset on our own. Just create a google form and sent to a few friends and family and hence you will end up with a dataset. Apart from this it you can check this article on anonymizing information on kaggle

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.