How to extract contents by topic from a document?

Question

How to extract contents by topic from a document?

SRJ577

2022年2月20日 07:04

I am trying to extract information from resumes. I tried the pdfminer for the text extraction. But I need to extract the contents from a resume with respect to its title.

For example: I will be giving my educational details under a title EDUCATIONAL BACKGROUND, so I have to extract the content topic wise.

Is it possible to extract like that?

What will be the process behind that?

Is it possible to approach the problem in a segmentation manner.

Topic semantic-segmentation information-extraction deep-learning nlp machine-learning

Category Data Science

Keneni · Accepted Answer · 2020年9月9日 14:52

Here are a list of tools you can look into:

This was a neat read detailing the steps. The author was doing something similar to what you are trying.

https://towardsdatascience.com/how-to-build-a-resume-parsing-tool-ae19c062e377

Kalyan Prasad · Accepted Answer · 2020年9月8日 07:17

1

Kalyan Prasad answered at 2020年9月8日 07:17

pyresparser is useful for extracting information from resumes. I believe this should work in your case.

Check out the more details on the same here https://pypi.org/project/pyresparser/

Let me know if it works!

How to extract contents by topic from a document?

About