Classification of scanned documents in pdf files using deep learning or NLP
I know classifying images using cnn but I have a problem where I have multiple types of scanned documents in a pdf file on different pages. Some types of scanned documents present in multiple pages inside the pdf.
Now I have to classify and return which documents are present and the page numbers in which they present in the pdf document. If scanned document is in multiple pages I should return the range of page numbers like 1 - 10.
Input will be pdf files containing scanned target documents
Output should be classified Document Name and Its page numbers
Can any one guide me on how can I a build a model that can address this problem.
Thankyou
Topic similar-documents image-classification deep-learning nlp python
Category Data Science