Error Loading and Training on Tensorflow's 'Speech Commands Dataset'

I am trying to replicate the most basic version of this Google LEAF example. I am having problems loading in the Tensorflow Speech Commands Dataset. I load the datasets in as a TFRecord:

tfds.load('speech_commands', download='true', shuffle_files='false')

I then map the train, test and eval datasets through this pre-process function:

def preprocess(sample):
    audio = sample['audio']
    label = sample['label']

    audio = tf.cast(audio, tf.float32) / tf.int16.max
    return audio, label

I then create my model and attempt to train on my train dataset:

#Model is from leaf_audio/models
model = models.AudioClassifier(num_outputs=12)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss_fn, optimizer = tf.keras.optimizers.Adam(1e-4), metrics=['sparse_categorical_accuracy'])
model.fit(train_dataset, batch_size=None, epochs=10)

On training I receive an error in the Audio Classifier layer:

ValueError: Exception encountered when calling layer sequential (type Sequential). Input 0 of layer global_max_pooling2d is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: (None, 16000)

I think this is something to do with me loading in the data incorrectly, however, I have followed the example to the line in each of the loading steps.

For the full code please follow this link.

Topic tensorflow dataset

Category Data Science


The problem was that the input tensor ('audio') does not supply enough dimensions to be used by the audio classifier.

The solution to this is to add more dimensions in using this line at the end of the preprocess function:

audio = audio[tf.newaxis, tf.newaxis, :]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.