Tokenize text with both American and English words
I need to tokenize a corpus of abstracts from an international conference. The abstracts are usually American English but sometimes British English.
Consequently, I get 2 tokens for “organization” and “organisation” or “color” and “colour”. Examples : https://en.oxforddictionaries.com/spelling/british-and-spelling
Do you know a (python) library converting “British English” to “American English” (or vis versa) ?
I would be happy to that ... (but I am french and my english is not soo good)
Thanks.
Topic text-filter nltk text-mining
Category Data Science