Extracting information with corresponding fields

I have large pool of scanned county documents. I need to extract information like document title, borrower nameaddress, lender nameaddress etc.

The text is like this Eg: the deed of trust, between abc llc, a limited company, whose address is XXXXXX, herein called "borrower", and xyz, whose address is XXXXX,herein called "lender".

I used Named entity recognition method to extract the names, it works well. but how would i know which name is borrower and which one is lender? can anyone help me

Topic stanford-nlp nlp python machine-learning

Category Data Science


you’re definitely on the right track with NER. As for determining the ‘class’ of what you’ve extracted, I think you have two major options:

  1. Train a new entity type for each.
  2. Use a set of rules that examine the term in context to determine the class.

I think you might have more luck with 2, if the language surrounding the terms is pretty static. Also, I haven’t watched it yet, but I thought this might help as well - seems to be about exactly your problem domain. https://www.youtube.com/watch?v=KrXJmaSHBJU.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.