Context capturing in a Structured PDF?

I'm trying to extract resume (PDF) data. resumes always tend to follow a structure. so if you see some numbers in a cv; according to the context, we could tell whether its a telephone number, a birthday, or a date period. if I can classify/identify one entity then that would increase my ability to classify an entity near to it.

I'm still a newbie and appreciate if anyone could give me any thoughts on approaching this problem. what kind of machine learning models should i focus on?

Topic ocr deep-learning feature-extraction nlp machine-learning

Category Data Science


I'm wondering if this is really a machine learning problem. If you can get to text out of the PDF then you should be able to do a lot with a rules-based text analysis algorithm.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.