audio-recognition

What is the suggested way to create features (Mel-Spectograms) from speech signal for classification with ResNet?

3r1c

2022年5月29日 16:04

At the moment I have this piece of code which cuts a Spectogram into fixed length tensors: def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l[0][0]), n): if(i+n < len(l[0][0])): yield X_sample.narrow(2, i, n) The following piece of code downsamples the Audio Creates Mel_Spectograms and takes the log of it Applies a Cepstral Mean and Variance Normalization Then it cuts the spectogram with the code above into a fixed size of length and appends it …

Topic: audio-recognition preprocessing feature-extraction python

Category: Data Science

log mel energies

Mohamed El-Husseiny

2022年5月28日 10:06

I want to convert mel spectogram to log mel energies what I used is y, sr = librosa.load(filename, sr=16000) mel_spectrogram = librosa.feature.melspectrogram( y=y, sr=sr, n_mels=128, n_fft=1024, hop_length=512, power=2) log_mel_spectrogram = librosa.power_to_db(mel_spectrogram) I thought this converts to mel energies but I found this line of code log_mel_spectrogram = 20.0 / power * np.log10(np.maximum(mel_spectrogram, sys.float_info.epsilon)) My question is what is the difference between log-mel spectrograms and log mel energies, which line of code to use

Topic: audio-recognition deep-learning feature-extraction

Category: Data Science

Downsampling audio files for use in Machine Learning

Finn Maunsell

2022年5月7日 17:05

I'm trying to use the work (Neural Networks) done in this repo: https://github.com/jtkim-kaist/VAD It says this: Note: To apply this toolkit to other speech data, the speech data should be sampled with 16kHz sampling frequency. I've got speech data at 48khz. I've read in places that reducing sampling rate is a complicated process, you can't just remove every nth datapoint, you have to filter things... Is this necessary if I only intend to use the data in the Neural Network …

Topic: audio-recognition data-cleaning

Category: Data Science

How should I process the output from this neural network?

user135409

2022年5月6日 06:12

I have a neural network that takes an np.array of a mel spectrogram of a 3 second audio clip from a song as input, and outputs vector of individual predictions that it is from 494 given (individual) artists. At first, I was getting whole songs, splitting them into 3 second clips, inputting each clip into the nn, and averaging the outputs. But this proved to be wonky. I got advice that I should only need one 3 second clip, but …

Topic: audio-recognition neural-network

Category: Data Science

What's the best way to validate a rare event detection model during training?

jack

2022年5月5日 14:01

When training a deep model for rare event detection (e.g. sound of an alarm in a home device audio stream), is it best to use a balanced validation set (50% alarm, 50% normal) to determine early stopping etc., or a validation set representative of reality? If an unbalanced, realistic validation set is used it may have to be huge to contain only a few positive event examples, so I'm wondering how this is typically dealt with. In the given example …

Topic: audio-recognition anomaly-detection class-imbalance deep-learning

Category: Data Science

Error while Pre-processing Audio Data using Librosa (audio analysis library in python) for DL model

Sharjeel M.

2022年4月17日 06:32

I am beginner in Audio classification field in DL. I followed a YouTube Music Genre Classification Series, which is working fine and been very helpful but I have a problem/error in pre-processing part. I get this error repeatedly. The picture of the error and the code is attached. I don't seem to understand what the error is because I've never worked with Librosa (Audio Analysis Library in Python). Kindly help me with that. Thank you. import json import os import …

Topic: audio-recognition rnn deep-learning

Category: Data Science

Segment 5-7 min audio into sentence wise audio clips for creating speech recognition dataset

Papasani Mohansrinivas

2022年4月15日 12:07

I am trying to create a speech recognition dataset, especially for Indian Accents. I am taking from colleagues to build this. Daily I send an article link and ask them to record and upload it to google drive. I have a problem with this approach. All audio recordings of length 5 -7 min. I am using the DeepSpeech model for this and it requires 10-sec audio sentences. Suggest me any approach if possible to segment audio files into corresponding sentence …

Topic: speech-to-text audio-recognition dataset

Category: Data Science

Training a sound localization neural network

Shivalnu

2022年4月13日 19:07

I am trying to train a neural network, to estimate the location (in degrees from 0 to 180) a sound is coming from. I am using TensorFlow Keras in python to train the model. The input data are two binaural cues, specifically the ILD (Interaural Level Difference) and the ITD (Interaural Time Difference), each vector, consisting of the two above described features, is of dimensions [1,71276]. I have a total of 2639 measurements, 10% of which are used as validation …

Topic: audio-recognition keras tensorflow deep-learning neural-network

Category: Data Science

Why normalization kills my accuracy

Spring

2022年4月12日 06:23

I have a binary sound classifier. I have a feature set that is extracted from audio with size of 48. I have a model(multi layer neural network) that has around %90 accuracy on test and validation sets. (without normalization or Standardization) I see that the feature values are mostly around [-10, +10]. But there are certain features with a mean of 4000. Seeing unproportional values within features, I thought some feature scaling might improve things. So using scikit-learn tools I …

Topic: audio-recognition normalization scikit-learn classification

Category: Data Science

Sound Classification for Multiple Classes for English Letters

Fatimah Mohmmed

2022年3月27日 07:03

I have recorded audio files for the English letters, each file includes 26 letters. I have split each letter into a separate audio file. Now I want to put similar audio letters into one folder. I can do it manually but it will take time. Is there a classifier method to this?

Topic: audio-recognition multiclass-classification classification

Category: Data Science

How to deal with different audio formats for audio classification?

Aaditya ura

2022年2月8日 13:07

I am working on an audio classification problem statement to classify between two audio classes. I have collected samples from jotform, they are providing audio widget to collect .wav audio but it turned out that widget is storing data in .mp3 format : In my problem statement, Classification classes are from different formats : class A : all the 100 samples are in .mp3 format ( jot form collection ) class B : all the samples are in .wav format …

Topic: audio-recognition classification dataset feature-selection machine-learning

Category: Data Science

Audio Classification with Counter

Matthew

2022年2月5日 11:05

I'm trying to create a model that can identify one particular sound, and every time it hears that sound, it increases a counter by 1. So for example, if it hears a specific bird chirping ten times, the counter should display the number 10. I'm looking for a bit of guidance here as to how to go about this. I know that I will need to use audio classification and for my data, I only have .wav files of that …

Topic: audio-recognition deep-learning neural-network classification python

Category: Data Science

Why is GTZAN dataset so widely used without copyright permission

Ross Gardiner

2022年2月3日 16:05

I am hoping to use the GTZAN music dataset to evaluate the performance of several noise-cancelling algorithms as part of a project for my undergrad. I notice that GTZAN is widely used across the literature for audio classification and even has exposure within Tensorflow and Pytorch APIs. Unfortunately, I cannot find any information about the copyright status of data within GTZAN besides on the marsyas website itself where it is revealed that no permissions to redistribute the data have been …

Topic: audio-recognition deep-learning dataset machine-learning

Category: Data Science

Tool for labeling audio

Alexey Abramov

2022年1月21日 08:13

I have few thousand audio signals to label into 2 different classes and save them to numpy array for further training of models. MATLAB recently released Signal Labeler for their Signal Analyzer, that could help to label time series, but for certain reasons, I can't use it. Is there any specific tool for analysis and labeling of time series for Python? It is not necessary to save data and labels into numpy arrays, .csv format or anything similar is suitable …

Topic: labels audio-recognition time-series

Category: Data Science

TensorFlow Speech Emotion Recognition Model gives same prediction for all inputs

ForkyTheEditor

2022年1月13日 12:40

Dataset used: RAVDESS (I've only used the audio only files) Here's a sample after I've processed the data: And the code for the label encoding: #encode labels as ints lb = LabelEncoder() y_train = np_utils.to_categorical(lb.fit_transform(y_train)) y_test = np_utils.to_categorical(lb.fit_transform(y_test)) #Not sure if this is needed x_train =np.expand_dims(x_train, axis=2) x_test= np.expand_dims(x_test, axis=2) Model: model.add(Conv1D(16, 5,strides=2 ,padding='same', input_shape=(259,1))) model.add(Conv1D(16, 5,padding='same', activation="relu")) model.add(Dropout(0.1)) model.add(MaxPooling1D(pool_size=(6))) model.add(LSTM(1)) model.add(Flatten()) model.add(Dense(10, activation="relu")) model.add(Dense(10,activation="softmax")) model.summary() opt = keras.optimizers.RMSprop(lr=0.00001, decay=1e-6) model.compile(metrics=['accuracy'], optimizer=opt, loss='categorical_crossentropy') history = model.fit(x_train, y_train, batch_size=1,epochs=15, validation_data=(x_train, y_train)) …

Topic: audio-recognition keras tensorflow neural-network python

Category: Data Science

Detecting Data Drift in Audio Data

TwinPenguins

2022年1月12日 22:28

For a give set of audio files collected from an industrial process via a microphone, I have extracted suitable features and fed them into a neural network for training a binary classifier as depicted below. The model has been performing quite well on an unseen data. I am at the stage of developing a sub-product to monitor data drift forecasting the inevitable i.e. data changes (namely microphone position changes, product materials changes and produces a distinct signal, background noise prevail …

Topic: data-drift concept-drift audio-recognition time-series machine-learning

Category: Data Science

Trim left tail of music in audio file

David Harar

2021年12月29日 23:39

I have audio files, most of them start with the same music, and then a conversation begins. I want to trim the part of the music (which can be varied in length). I have no labels, I can transcribe the whole file using off-the-shelf models, but the music itself contains words which are resulted in false positives. but I know to extract features from the audio, such as Mel spectrogram, pitch, etc. The music at the beginning of the file …

Topic: k-nn audio-recognition unsupervised-learning

Category: Data Science

Smooth transistion of music notes using music processing

InputBlackBoxOutput

2021年12月18日 19:08

I need advice regarding a small dataset of individual music notes played on a harmonica that I created a while ago. I want to build a system that reads notations in a text file and create realistic audio by combining audio files with the near-perfect transition of notes What should I look into for reference or as a starting point for the project? Thanks in advance! PS: I had to provide a tag so I choose 'audio-recognition'. I do not …

Topic: audio-recognition

Category: Data Science

Augmentation for sound recognition of dog barks for CNNs

JumboJetlin

2021年11月24日 00:12

I am training CNNs to recognize dog barking, and for this I would like to augment the data sets I have (~30'000 10s clips with either barks, or no-barks in them). The straight forward idea was to mix the barking audio clips with the no-barking clips (maybe some leaves rustling or whatever), such that the resulting remix is again a barking audio clip. I did this by simply adding up the two waveforms (from .wav files) in a random ratio, …

Topic: cnn data-augmentation audio-recognition deep-learning machine-learning

Category: Data Science

Discouraging values or smoothing out results when model fitting

user25758

2021年11月23日 18:07

I'm working on training a network to do direction of arrival prediction and I'm having the issue that no matter what my network is (ResNet 18 - 101, CRNN, CNN, etc...) my results tend toward one small range of values as seen in the image below which leads obviously to the following errors: I have attempted to just "wait it out" until my network finally learns, but my validation loss diverges pretty much immediately. An example can be seen below. …

Topic: audio-recognition regression deep-learning

Category: Data Science

About