Bold text detection
I am currently working on a project where I need to detect bold text on a multi font-size image (so no mathematic morphology possible). This detection will be used in parallel of an OCR system (with tesseract) to detect which information (in bold) are important in a document.
I already tested the wordFontAttribute() function of tesseract but it is inconsistent : it provide me poor results of bold detection and decresease the performance of my OCR system because to use this function an old version of tesseract (v3) is needed.
I found a couple of scientific researchs who were based on font style detection and so on bold detection (Automatic Detection of Italic, Bold and All-Capital Words in Document Images and Script Independent Detection of Bold Words in Multi Font-size Documents on google scholar).
I was wondering if there is an code implementation of this research online.
Any others ideas on bold detection is also welcome
Topic ocr computer-vision
Category Data Science