Using synthetic dataset for training NVIDIA NeMo Matchbox
Does anyone has success in training small command recognition models on synthetic dataset?
The full details is the following: I need a small model to run a command recognition (about 30 commands) on embedded device. It looks like NVIDIA NeMo MatchboxNet is a good solution, but I have no standard dataset covering my set of commands. The model should be adapted to a broad variation of speakers. Obtaining real dataset seems difficult. I consider using NVIDIA models like Waveglow/Flowtron to generate custom dataset to train a model on this. Is it feasible? Any other suggestions?
Topic speech-to-text nvidia
Category Data Science