Using synthetic dataset for training NVIDIA NeMo Matchbox

Does anyone has success in training small command recognition models on synthetic dataset?

The full details is the following: I need a small model to run a command recognition (about 30 commands) on embedded device. It looks like NVIDIA NeMo MatchboxNet is a good solution, but I have no standard dataset covering my set of commands. The model should be adapted to a broad variation of speakers. Obtaining real dataset seems difficult. I consider using NVIDIA models like Waveglow/Flowtron to generate custom dataset to train a model on this. Is it feasible? Any other suggestions?

Topic speech-to-text nvidia

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.