How to extract details (educational details, exp details etc.) from a resume?
I am trying to build a resume parser which can extract details such as Name, Address, Education details (degree name, college name, university name, course duration), Experience details (designation, company name, company location, work duration) from any kind of resume.
I tried to train a custom ner model using spacy. For that I created annotations from resumes which have entities as follows:
Degree - Degree name, College - College name, University - University name, Degree_date - Degree date.
Similarly created entities for experience too.
So i extracted text from the resume, for preprocessing I have done:
- Removed new lines, extra spaces, html tags.
- Then removed special symbols such as bullet symbols etc.
- Also encoded to ascii format so that some other kind of symbols will be removed
The resultant text is used to annotate the entities.
Then I trained the model but it is not working as expected. It cannot extract all the details and sometimes the entities are wrongly detected.
Rule based extractor cannot be considered.
I want to know:
- Why my custom ner model is not extracting properly and not able to extract the text in the order as in resume.
- Any other possibility is there?
- Is it possible to use bert for this? If so, how should i structure the annotaions or in what format should i create the dataset for training bert?
- If there is any other approach, please specify that too?
Any help or suggestion will be greatly appreciated.
Topic spacy nlp python information-retrieval
Category Data Science