Machine learning algorithms for forming Homophones from input dataset word

https://www.google.com/search?sxsrf=ALeKk01_SgA8G4UfNm4rOqku4yJBFvKhLw%3A1600154854621source=hpei=5mxgX8ztI6KZ4-EPq-mL8Akq=homophones+exampleoq=Homophonesgs_lcp=ChFtb2JpbGUtZ3dzLXdpei1ocBABGAEyBQgAELEDMgUIABCxAzICCAAyCAgAELEDEIMBMgUIABCxAzICCAAyAggAMgUIABCxAzoHCCMQ6gIQJzoECCMQJzoFCAAQkQI6CAguELEDEIMBOgUILhCxA1DkKliKSGDuUGgBcAB4AIAB6wGIAe8NkgEFMC44LjKYAQCgAQGwAQ8sclient=mobile-gws-wiz-hp

Are there Machine learning algorithms for forming Homophones from input dataset word?

Homophones examples :

accessary, accessory.

ad, add.

air, heir.

all, awl.

allowed, aloud.

alms, arms.

Input : ad

Output : ad, add

Are there Machine learning algorithms for forming Homophones from input dataset word taking Indian regional languages viz Hindi, Gujarati, Bengali etc and other languages viz French, German, Italian, Spanish, Dutch etc?

Topic bag-of-words word

Category Data Science


I have very limited knowledge about homophones generator. I feel to make a homophone detector, one should focus more on the phonetics of the word rather than the spellings.

  1. try to make a word-phonetics list dataset and then train a model.
  2. focus on Levenstein/fuzzy/edit distance between the phonetics of words.

eg - two, too and to all have the same phonetics - T UW . try this website to find phonetics - http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=to they have already mapped the words and phonetics. I think finding homophones in non-english languages would be an uphill task.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.