At which step we need to feed ground truth of transcription in CNN+RNN+CTC architecture(OCR) and in which format?
I have to recognize text from images and trying to understand CNN+BiLSTM+CTC architecture. I have text images in .jpg format but
- How should I generate its transcription like in .txt or in .xml format?
- Where I will feed ground truth in this architecture like along with text images in CNN or in RNN or in CTC layer?
I haven't found any clear explanation about OCR ground truth file format.
Any help will be highly appreciated.
Topic ocr deep-learning
Category Data Science