Name Anonymization Software
Although I have seen a few good questions asked about data anonymization, I was wondering if there were answers to this more specific variant.
I am seeking a tool (or to design one) that will anonymize human names from a specific country: particularly first names in unstructured text. Many of the tools that I have seen have considered the wider dimensions of data anonymization; with an equal focus on dates of birth, addresses, etc.
An imperative aspect is that it needs to have near absolute recall. The major pitfalls, as far as I can see, are diminutive variants ("Tommy" instead of "Thomas", "Ben" instead of "Benjamin", etc.) and typos. These two factors prevent a simple regex based on a database of names (based on censuses, etc.)
Topic anonymization text-mining
Category Data Science