Inference from text data without label or Target

I have a use case where I have text data entered by an approver while approving of some loan.

I have to make some inferences as to what could be the reasons for approval using NLP. How should I go about it?

It's a Non english language. Can Clustering of text help?? Is it possible to cluster TEXT OF non English language using python libraries.

Topic text-mining nlp clustering

Category Data Science


Is it possible to cluster TEXT OF non English language using python libraries?

Sure! classic approaches based on Bag-of-Words are language independent. For modern approaches based on DNNs, mostly pre-trained models, you just need to find a model in your language or train one model from scratch (for this you need lots of text in that language). For example in case of using AWS infrastructure, check Object2Vec algorithm.

Can Clustering of text help?

Can help. For instance for an initial labeling you can cluster data into similar texts and labels each according to overal concept. More sophisticated solution (easily implemented in python) is topic modeling e.g. LDA algorithm.

More sophisticated solution is, again, pre-trained models like S-BERT.

In this direction, I also recommend having an analysis on keywords for algorithms like RAKE or YAKE.

Hope it helps!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.