Segment 5-7 min audio into sentence wise audio clips for creating speech recognition dataset
I am trying to create a speech recognition dataset, especially for Indian Accents. I am taking from colleagues to build this. Daily I send an article link and ask them to record and upload it to google drive.
I have a problem with this approach. All audio recordings of length 5 -7 min. I am using the DeepSpeech model for this and it requires 10-sec audio sentences.
Suggest me any approach if possible to segment audio files into corresponding sentence phrases or to build a better with 5 min length audio files. Suggestions are more than welcome on a better way to create a speech-to-text dataset.
Topic speech-to-text audio-recognition dataset
Category Data Science