Which is the best algorithm for entity extraction for unstructured document

I have unstructured documents from which I have to extract the information like let buyer name, seller name, expiry date, buying date etc. I had planned to use spacy(Custom entity recolonization(Followed this blog https://medium.com/@manivannan_data/how-to-train-ner-with-custom-training-data-using-spacy-188e0e508c6)). But it seems sometimes buyer name predict as seller name and vice-versa and also sometimes got multiple predicted data wrongly in single entity when I passed whole document content. FYI.. This documents have approx 2-20 pages. so it has large content.

Can someone share if we can use any other packages for higher accuracy? if not how I need to train the model so that accuracy will be higher? Thanks in advance

Topic scipy python machine-learning

Category Data Science


Try to clean your document and use the flair library, it's a user friendly library from Zalando Research that allows you do do all sorts of nlp tasks very quickly. Especially NER.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.