How to double audio dataset?

Question

How to double audio dataset?

Abylay Omar

2021年6月2日 05:41

I am trying to develop a mispronunciation detection model for English speech. I use TIMIT dataset, this is phoneme labeled audio dataset.

A phoneme is any of the perceptually distinct units of sound. So, my dataset looks like an audio file and string of phonemes corresponding to that audio. Ex:

SX141.wav - p l eh zh tcl t ax-h pcl p axr tcl t ih s pcl p ey dx ih n ax v aa dx ix z ix kcl k w aa dx ix kcl k ah m pcl p tcl t ih sh ix n

So, the problem is overfitting. My model is very good at training, but poor on testing. So because of this, I want to try synthetically increase my dataset. Maybe change the speed of audio or add some background noises etc.

Are there any already-ready solutions for doubling the audio dataset? Or, how to change speed and add some noises on the audio file? Will be it helpful?

Topic speech-to-text machine-learning-model audio-recognition neural-network dataset

Category Data Science

Abylay Omar · Accepted Answer · 2021年6月2日 05:41

I did not find the ready solution for it. I solve this task by myself.

Increase speed.

 from scipy.io.wavfile import read, write

 Fs, data = read(filename)
 write(destination, int(Fs*1.25), data)

I save the file and increase its frequency by 1.25.

Add noise.

 import numpy as np
 from scipy.io.wavfile import read, write

 Fs, data = read(filename)
 data_noise = np.random.normal(0, .2, data.shape)
 write(destination, int(Fs), data+data_noise)

Here I generate the noise array and add it to the original wav signal.

How to double audio dataset?

About