how to programmatically introduce grammatical errors in sentences
I've a set of sentences in English language. I'm exploring ways to create a dataset of sentences with grammatical errors programmatically. The following options has been tried out randomly -
- identify verbs, propositions etc. by POS tagging and change the tense or remove them
- change the order of 2 or more words
- remove commas, colons, semi-colons etc.
These are not always fool-proof. Are there any proven ways to approach this problem?
Topic grammar-inference language-model nlp python
Category Data Science