How to build a speech corpus for continuous speech?
I am developing a speech corpus from audiobooks. I was able to collect recordings with their transcriptions but without timing information. Therefore I am looking for recent techniques to automatically build segments adapted to the development of ASR systems for large vocabulary continuous speech.
Thanks in an advance.
Topic corpus speech-to-text
Category Data Science