At which step we need to feed ground truth of transcription in CNN+RNN+CTC architecture(OCR) and in which format?

maryam mehboob

2021年6月25日 07:03

I have to recognize text from images and trying to understand CNN+BiLSTM+CTC architecture. I have text images in .jpg format but

How should I generate its transcription like in .txt or in .xml format?
Where I will feed ground truth in this architecture like along with text images in CNN or in RNN or in CTC layer?

I haven't found any clear explanation about OCR ground truth file format.

Any help will be highly appreciated.