Bold text detection

I am currently working on a project where I need to detect bold text on a multi font-size image (so no mathematic morphology possible). This detection will be used in parallel of an OCR system (with tesseract) to detect which information (in bold) are important in a document.

I already tested the wordFontAttribute() function of tesseract but it is inconsistent : it provide me poor results of bold detection and decresease the performance of my OCR system because to use this function an old version of tesseract (v3) is needed.

I found a couple of scientific researchs who were based on font style detection and so on bold detection (Automatic Detection of Italic, Bold and All-Capital Words in Document Images and Script Independent Detection of Bold Words in Multi Font-size Documents on google scholar).

I was wondering if there is an code implementation of this research online.

Any others ideas on bold detection is also welcome

Topic ocr computer-vision

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.