At which step we need to feed ground truth of transcription in CNN+RNN+CTC architecture(OCR) and in which format?

I have to recognize text from images and trying to understand CNN+BiLSTM+CTC architecture. I have text images in .jpg format but

  1. How should I generate its transcription like in .txt or in .xml format?
  2. Where I will feed ground truth in this architecture like along with text images in CNN or in RNN or in CTC layer?

I haven't found any clear explanation about OCR ground truth file format.

Any help will be highly appreciated.

Topic ocr deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.