speech-to-text

where to start in natural language processing for a language

AMAR BESSALAH

2022年5月19日 11:06

My native language is a regional language and few people speak it. I have some assignements in a machine learning course and i was thinking about doing some natural languge processing on my native language but i don't know where to start since there is almost no research about this language ( no corpus , no research papers , ... ) and i'm new to machine learning. I want to start doing everything from bottom and i want to do …

Topic: speech-to-text machine-translation nlp

Category: Data Science

Adding words to vocabulary on pre-trained ASR model

naklis

2022年5月16日 16:01

I have a pre-trained ASR model but want to add some missing words to the vocabulary. Can I do this or will it invalidate the entire training? Lets say I use the pretrained model: wav2vec2-base-960h and want to use it on sports commentary but a lot of the players' names are missing in the vocabulary. Is there any way I can add the names and maybe train on a few clips where the names appear or do I have to …

Topic: speech-to-text nlp

Category: Data Science

NameError: name 'librosa' is not defined

ali hayen

2022年5月16日 13:17

i'm working on Arabic Speech Recognition using Wav2Vec XLSR model. While fine-tuning the model it gives the error shown in the picture below. i can't understand what's the problem with librosa it's already installed !!!

Topic: data-science-model speech-to-text anaconda library deep-learning

Category: Data Science

Using synthetic dataset for training NVIDIA NeMo Matchbox

begemotv2718

2022年5月11日 13:46

Does anyone has success in training small command recognition models on synthetic dataset? The full details is the following: I need a small model to run a command recognition (about 30 commands) on embedded device. It looks like NVIDIA NeMo MatchboxNet is a good solution, but I have no standard dataset covering my set of commands. The model should be adapted to a broad variation of speakers. Obtaining real dataset seems difficult. I consider using NVIDIA models like Waveglow/Flowtron to …

Topic: speech-to-text nvidia

Category: Data Science

Difference between speech recognition and automatic speech recognition

Basel

2022年5月9日 12:45

I was wondering, is there a difference between Speech Recognition and Automatic Speech Recogntion? I have seen both terms used in various papers, and I am not sure whether they are simply used interchangeably or whether there is a difference between the two.

Topic: speech-to-text nlp

Category: Data Science

How to prepare Audio-text data for speech recognition

johnyc

2022年5月8日 16:05

I have gathered some raw audio from all the conferences, meetings, lectures & casual conversation that I was part of. The machine transcription did not offer good results (from Azure, AWS etc.) I would transcribe it so to have both data+label (audio+text) for ML training. My question is whether to have small (3-10 sec.) audio files (split it at silence) and then transcribe each small file? or large file with timestamps with subtitle.srt format? What if I have a long …

Topic: speech-to-text dataset data-cleaning

Category: Data Science

ValueError: Error when checking input: expected the_input to have 3 dimensions, but got array with shape (14174, 1)

Hamza Hamdani

2022年4月21日 16:08

hope you're all doing good ! I am working on Automatic Speech Recognition with Python with the LibriSpeech Dataset. After preprocessing the audios data and applying an "MFCC featurizing" I append everything into a list and get a shape of (14174,). Knowing that each sample has a different length but the same number of features for example : print(X[0].shape) print(X[12000].shape) >> (615, 13) >> (301, 13) Now when I feed the data into my network with an Input layer defined …

Topic: reshape speech-to-text lstm rnn

Category: Data Science

Segment 5-7 min audio into sentence wise audio clips for creating speech recognition dataset

Papasani Mohansrinivas

2022年4月15日 12:07

I am trying to create a speech recognition dataset, especially for Indian Accents. I am taking from colleagues to build this. Daily I send an article link and ask them to record and upload it to google drive. I have a problem with this approach. All audio recordings of length 5 -7 min. I am using the DeepSpeech model for this and it requires 10-sec audio sentences. Suggest me any approach if possible to segment audio files into corresponding sentence …

Topic: speech-to-text audio-recognition dataset

Category: Data Science

GMM in speech recoginition using HMM-GMM

Naveen Gabriel

2022年3月13日 15:00

I am trying to solve/understand ASR using HMM-GMM. At the abstract level i do understand what's happening but I did not understand how GMM fits into it. My data has 5K hours of speech from single user. I took the above picture from this article. I do know what is GMM but i am unable to wrap my head around it. Can somebody explain with a simple example.

Topic: markov-hidden-model speech-to-text gaussian nlp

Category: Data Science

How does Wav2Vec 2.0 feed output from Convolutional Feature Encoder as input to the Transformer Context Network

user116029

2022年2月5日 04:04

I was reading the Wav2Vec 2.0 paper and trying to understand the model architecture, but I have trouble understanding how audio raw inputs of variable lengths can be fed through the model, especially from the Convolutional Feature Encoder to the Transformer Context Network. During fine-tuning (from what I have read), even though audio raw inputs within a batch will be padded to the length of the longest input in the batch, the length of inputs can differ across batches. Therefore …

Topic: speech-to-text

Category: Data Science

Why are observation probabilities modelled as Gaussian distributions in HMM?

Roberto Pierson

2021年11月4日 18:05

HMM is a statistical model with unobserved (i.e. hidden) states used for recognition algorithms (speech, handwriting, gesture, ...). What distinguishes DHMM form CHMM is the transition probability matrix P with elements. In CHMM, state space of hidden variable is discrete and observation probabilities are modelled as Gaussian distributions. Why are observation probabilities modelled as Gaussian distributions in CHMM? Why are they (best)distributions for recognition systems in HMM?

Topic: markov-hidden-model speech-to-text gaussian python machine-learning

Category: Data Science

How can I build my voice speech-to-text model?

NPP

2021年11月3日 10:30

I found an instruction to build such kind of custom model on Azure. Prepare data for Custom Speech However, I would like to either fine-tune or transfer learning on Google Colaboratory or docker. In that case, what machine learning framework do you recommend using? If you know some Github repo or articles for this challenge, could you share them with me?

Topic: finetuning transfer-learning speech-to-text training nlp

Category: Data Science

Evaluate Text-to-speech without Human Involved?

Nontawat Wutticome

2021年10月9日 11:57

I've explored text-to-speech evaluation matrices and they seem to used Mean Opinion Score (MOS) to evaluate a particular model. This matrice required humans to help to judge the model based on a scale (Bad, Moderate, Good, Etc.). Are there other evaluation matrices that algorithmically estimate the TTS system and don't require any human? but it still gives the result that correlated to human evaluation?

Topic: model-evaluations speech-to-text audio-recognition evaluation machine-learning

Category: Data Science

NeMo Conformer-CTC Predicts Same Word Repeatedly When Fine-Tuning

Karima Kadaoui

2021年9月23日 10:00

I'm using the NeMo Conformer-CTC small on the LibriSpeech dataset (the clean subset, around 29K inputs, using 90% for training and 10% for testing). I use Pytorch Lightning. When I try to train, the model learns 1 or 2 sentences in 50 epochs and gets stuck at a loss of 60-something (I trained it for 200 epochs too and it didn't budge). But when I try to fine tune it using a pre-trained model from the toolkit, it predicts correctly …

Topic: loss finetuning speech-to-text pytorch nlp

Category: Data Science

How Pretraining part actually work in Wav2vec models? Which data is qualify to be the adequat for fine-tuning part the model of speech2text

Yassine

2021年8月18日 07:23

Pretraining and fine-tuning the algorithm of wav2vec2.0, the new one using in FAcebookAI to do speech to text for low-resource language. I didn't actually get how the model does the pretraining part if someone can help me, I read the article https://arxiv.org/abs/2006.11477 but I ended up not getting the notion of pre-train in this regard. the question is HOW do we do pretraining?! Note : i'm a beginner in ML, so far , i've done some project with nlp,I have …

Topic: transfer-learning speech-to-text unsupervised-learning semi-supervised-learning

Category: Data Science

How to evaluate the quality of speech-to-text data without access to the true labels?

miri_h_ds

2021年8月18日 07:19

I am dealing with a data set of transcribed call center data, where customers are being recorded when interacting with the agent. This is then automatically transcribed by an external transcription system. I want to automatically assess the quality of these transcriptions. Sadly, the quality seems to be disastrous. In some cases it's little more than gibberish, often due to different dialects the machine is not able to handle. We have no access to the original recordings (data privacy), so …

Topic: transformer speech-to-text text-mining nlp

Category: Data Science

How to train my model on unpaired speech and text for my speech recognition model?

Yassine

2021年7月31日 20:00

When we have unpaired data, means we have a dataset of audios and a dataset of texts, BUT they are not associated, and as we know, to build a speech recognition model we do have to combine in our input dataset each speech with its adequate text and then train our model upon that to have a model capable to convert new audio to its text. For my case, while doing data collection, I wind up with audios and text …

Topic: speech-to-text

Category: Data Science

Hidden Markov models in Speech Recognition

Raiymbek Akshulakov

2021年6月15日 13:18

My first question here. So I am trying to build a sign language translator(from signs to text) and noticed that the problem itself is quite similar to speech recognition, so I started to research about that. Right now one thing is I can't figure out is how exactly Hidden Markov models are used in speech recognition. I can understand how HMM can be used for example in part-of-speech tagging where we get a one of the states for each word. …

Topic: markov-hidden-model speech-to-text nlp machine-learning

Category: Data Science

What are the hidden and observed states when building an acoustic model?

Clovis Nyu

2021年6月7日 12:50

I have been trying to learn how to build ASRs and have been researching for awhile now, but I can't seem to get a straight answer. From what I understand, an ASR requires an Acoustic Model. That Acoustic Model can be trained via Baum-Welch or Viterbi training. Those algorithms train the parameters of a Hidden Markov Model. From what I gather, to train the parameters, we need the Wav files, from which the MFCC feature vectors can be obtained. On …

Topic: markov-hidden-model speech-to-text nlp

Category: Data Science

How to double audio dataset?

Abylay Omar

2021年6月2日 05:41

I am trying to develop a mispronunciation detection model for English speech. I use TIMIT dataset, this is phoneme labeled audio dataset. A phoneme is any of the perceptually distinct units of sound. So, my dataset looks like an audio file and string of phonemes corresponding to that audio. Ex: SX141.wav -> p l eh zh tcl t ax-h pcl p axr tcl t ih s pcl p ey dx ih n ax v aa dx ix z ix kcl …

Topic: speech-to-text machine-learning-model audio-recognition neural-network dataset

Category: Data Science

About